Banterle, Artusi, Debattista, Chalmers - Advanced High Dynamic Range Imaging PDF

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 276

Advanced High Dynamic Range Imaging

Francesco Banterle Alessandro Artusi


Kurt Debattista Alan Chalmers
Foreword by Holly Rushmeier

High dynamic range (HDR) imaging is the term given to the capture, storage, manipulation, transmission, and display of images that more accurately represent the wide
range of real-world lighting levels. With the advent of a true HDR video system and its
20 year history of creating static images, HDR is finally ready to enter the mainstream
of imaging technology. This book provides a comprehensive practical guide to facilitate
the widespread adoption of HDR technology. By examining the key problems associated with HDR imaging and providing detailed methods to overcome these problems,
the authors hope readers will be inspired to adopt HDR as their preferred approach for
imaging the real world. Key HDR algorithms are provided as MATLAB code as part of
the HDR Toolbox.

This book provides a practical introduction to the emerging new discipline of high
dynamic range imaging that combines photography and computer graphics. . . By
providing detailed equations and code, the book gives the reader the tools needed
to experiment with new techniques for creating compelling images.
From the Foreword by Holly Rushmeier, Yale University

Download MATLAB
source code for the book at
www.advancedhdrbook.com

Foreword by

Holly Rushmeier

Francesco Banterle
Alessandro Artusi
Kurt Debattista
Alan Chalmers

Advanced
High Dynamic Range
Imaging

Advanced
High Dynamic Range
Imaging

This page intentionally left blank

Advanced
High Dynamic Range
Imaging
Theory and Practice

Francesco Banterle
Alessandro Artusi
Kurt Debattista
Alan Chalmers

A K Peters, Ltd.
Natick, Massachusetts

CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742
2011 by Taylor & Francis Group, LLC
CRC Press is an imprint of Taylor & Francis Group, an Informa business
No claim to original U.S. Government works
Version Date: 20120202
International Standard Book Number-13: 978-1-4398-6594-1 (eBook - PDF)
This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been made to publish reliable data and information, but the author and publisher
cannot assume responsibility for the validity of all materials or the consequences of their use. The
authors and publishers have attempted to trace the copyright holders of all material reproduced in
this publication and apologize to copyright holders if permission to publish in this form has not
been obtained. If any copyright material has not been acknowledged please write and let us know so
we may rectify in any future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced,
transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or
hereafter invented, including photocopying, microfilming, and recording, or in any information
storage or retrieval system, without written permission from the publishers.
For permission to photocopy or use material electronically from this work, please access www.
copyright.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc.
(CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For organizations that have been
granted a photocopy license by the CCC, a separate system of payment has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and
are used only for identification and explanation without intent to infringe.
Visit the Taylor & Francis Web site at
http://www.taylorandfrancis.com
and the CRC Press Web site at
http://www.crcpress.com

To my parents. FB
Dedicated to all of you: Franca, Nella, Sincero, Marco, Giancarlo,
and Despo. You are always in my mind. AA
To Alex. Welcome! KD
To Eva, Erika, Andrea, and Thomas. You are my reality! AC

This page intentionally left blank

Contents

Introduction
1.1 Light, Human Vision, and Color Spaces . . . . . . . . . .

1
4

HDR Pipeline
2.1 HDR Content Generation . . . . . . . . . . . . . . . . . .
2.2 HDR Content Storing . . . . . . . . . . . . . . . . . . . .
2.3 Visualization of HDR Content . . . . . . . . . . . . . . . .

11
12
22
26

Tone Mapping
3.1 TMO MATLAB Framework . . . . . . . . .
3.2 Global Operators . . . . . . . . . . . . . . .
3.3 Local Operators . . . . . . . . . . . . . . . .
3.4 Frequency-Based Operators . . . . . . . . .
3.5 Segmentation Operators . . . . . . . . . . .
3.6 New Trends to the Tone Mapping Problem
3.7 Summary . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

33
36
38
61
75
86
103
112

Expansion Operators for Low Dynamic Range Content


4.1 Linearization of the Signal Using a Single Image
4.2 Decontouring Models for High Contrast Displays
4.3 EO MATLAB Framework . . . . . . . . . . . . .
4.4 Global Models . . . . . . . . . . . . . . . . . . .
4.5 Classication Models . . . . . . . . . . . . . . . .
4.6 Expand Map Models . . . . . . . . . . . . . . . .
4.7 User-Based Models: HDR Hallucination . . . . .
4.8 Summary . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

113
115
119
121
122
128
134
144
145

vii

.
.
.
.
.
.
.

.
.
.
.
.
.
.

viii

CONTENTS

Image-Based Lighting
5.1 Environment Map . . . . . . . . . . . . . . . . . . . . . .
5.2 Rendering with IBL . . . . . . . . . . . . . . . . . . . . .
5.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . .

149
149
155
174

Evaluation
6.1 Psychophysical Experiments . . . . . . . . . . . . . . . . .
6.2 Error Metric . . . . . . . . . . . . . . . . . . . . . . . . .
6.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . .

175
175
187
190

HDR Content Compression


7.1 HDR Compression MATLAB Framework
7.2 HDR Image Compression . . . . . . . . .
7.3 HDR Texture Compression . . . . . . . .
7.4 HDR Video Compression . . . . . . . . .
7.5 Summary . . . . . . . . . . . . . . . . . .

193
193
194
205
218
225

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

A The Bilateral Filter

227

B Retinex Filters

231

C A Brief Overview of the MATLAB HDR Toolbox

233

Bibliography

239

Index

258

Foreword

We perceive the world through the scattering of light from objects to our
eyes. Imaging techniques seek to simulate the array of light that reaches our
eyes to provide the illusion of sensing scenes directly. Both photography
and computer graphics deal with the generation of images. Both disciplines
have to cope with the high dynamic range in the energy of visible light that
human eyes can sense. Traditionally photography and computer graphics
took dierent approaches to the high dynamic range problem. Work over
the last ten years, though, has unied these disciplines and created powerful
new tools for the creation of complex, compelling, and realistic images.
This book provides a practical introduction to the emerging new discipline
of high dynamic range imaging that combines photography and computer
graphics.
Historically, traditional wet photography managed the recording of high
dynamic range imagery through careful design of camera optics and the
material layers that form lm. The ingenious processes that were invented
enabled the recording of images that appeared identical to real-life scenes.
Further, traditional photography facilitated artistic adjustments by the
photographer in the darkroom during the development process. However,
the complex relationship between the light incident on the lm and the
chemistry of the material layers that form the image made wet photography unsuitable for light measurement.
The early days of computer graphics also used ingenious methods to
work around two physical constraintsinadequate computational capabilities for simulating light transport and display devices with limited dynamic
range. To address the limited computational capabilities, simple heuristics
such as Phong reectance were developed to mimic the nal appearance
of objects. By designing heuristics appropriately, images were computed
that always t the narrow display range. It wasnt until the early 1980s

ix

Foreword

that computational capability had increased to the point that full lighting
simulations were possible, at least on simple scenes.
I had my own rst experience with the yet-unnamed eld of high dynamic range imaging in the mid-1980s. I was studying one particular approach to lighting simulationradiosity. I was part of a team that designed
experiments to demonstrate that the lengthy computation required for full
lighting simulation gave results superior to results using simple heuristics.
Naively, several of us thought that simply photographing our simulated
image from a computer screen and comparing it to a photograph of a real
scene would be a simple way to demonstrate that our simulated image was
more accurate. Our simple scene, now known as the Cornell box, was just
an empty cube with one blue wall, one red wall, a white wall, a oor and
ceiling, and a at light source that was ush with the cube ceiling. We
quickly encountered the complexity of lm processing. For example, the
very red light from our tungsten light source, when reected from a white
surface, looked red on lmif we used the same lm to image our computer screen and the real box. Gary Meyer, a senior member of the team
who was writing his dissertation on color in computer graphics, patiently
explained to us how complicated the path was from incident light to the
recorded photographic image.
Since we could not compare images with photography, and we had no
digital cameras at the time, we could only measure light directly with a
photometer that measured light over a broad range of wavelengths and incident angles. Since this gave only a crude evaluation of the accuracy of
the lighting simulation, we turned to the idea of having people view the
simulated image on the computer screen and the real scene directly through
view cameras to eliminate obvious three-dimensional cues. However, here
we encountered the dynamic range problem since viewing the light source
directly impaired the perception of the real scene and simulated scene together. Our expectation was that the two would look the same, but color
constancy in human vision wreaked havoc with simultaneously displaying
a bright red tungsten source and the simulated image with the light source
clipped to monitor white. Our solution at that time for the comparison
was to simply block the direct view of the light source in both scenes. We
successfully showed that in images with limited dynamic range, our simulations were more accurate when compared to a real scene than previous
heuristics, but we left the high dynamic range problem hanging.
Through the 1980s and 1990s lighting simulations increased in eciency
and sophistication. Release of physically accurate global illumination software such as Greg Wards Radiance made such simulations widely accessible. For a while users were satised to scale and clip computed values
in somewhat arbitrary ways to map the high dynamic range of computed
imagery to the low dynamic range cathode ray tube devices in use at the

Foreword

xi

time. Jack Tumblin, an engineer who had been working on the problem of
presenting high dynamic range images in ight simulators, ran across the
work in computer graphics lighting simulation and assumed that a principled way to map physical lighting values to a display had been developed
in computer graphics. Finding out that in fact there was no such principled
approach, he began mining past work in photography and television that
accounted for human perception in the design of image capture and display
systems, developing the rst tone mapping algorithms in computer graphics. Through the late 1990s the research community began to study alternative tone mapping algorithms and to consider their usefulness in increasing the eciency of global illumination calculations for image synthesis.
At the same time, in the 1980s and 1990s the technology for the electronic recording of digital images steadily decreased in price and increased
in ease of use. Researchers in computer vision and computer graphics, such
as Paul Debevec and Jitendra Malik at Berkeley, began to experiment with
taking series of digital images at varying exposures and combining them
into true high dynamic range images with accurate recordings of the incident light. The capability to compute and capture true light levels opened
up great possibilities for unifying computer graphics and computer vision.
Compositing real images with synthesized images having consistent lighting
eects was just one application. Examples of other processes that became
possible were techniques to capture real lighting and materials with digital
photography that could then be used in synthetic images.
With new applications made possible by unifying techniques from digital photography and accurate lighting simulation came many new problems
to solve and possibilities to explore. Tone mapping was found not to be
a simple problem with just one optimum solution but a whole family of
problems. There are dierent possible goals: images that give the viewer
the same visual impression as viewing the physical scene, images that are
pleasing, or images that maximize the visibility of detail. There are many
dierent contexts, such as dynamic scenes and low-light conditions. There
is a great deal of low dynamic range imagery that has been captured and
generated in the past; how can this be expanded to be used in the same
context as high dynamic range imagery? What compression techniques can
be employed to deal with the increased data generated by high dynamic
range imaging systems? How can we best evaluate the delity of displayed
images?
This book provides a comprehensive guide to this exciting new area. By
providing detailed equations and code, the book gives the reader the tools
needed to experiment with new techniques for creating compelling images.
Holly Rushmeier
Yale University

This page intentionally left blank

Preface

The human visual system (HVS) is remarkable. Through the process of eye
adaptation, our eyes are able to cope with the wide range of lighting in the
real world. In this way we are able to see enough to get around on a starlit
night and can clearly distinguish color and detail on a bright sunny day.
Even before the rst permanent photograph in 1826 by Joseph Nicephore
Niepce, camera manufacturers and photographers have been striving to
capture the same detail a human eye can see. Although a color photograph
was achieved as early as 1861 by James Maxwell and Thomas Sutton [130],
and an electronic video camera tube was invented in the 1920s, the ability
to simultaneously capture the full range of lighting that the eye can see
at any level of adaptation continues to be a major challenge. The latest
step towards achieving this holy grail of imaging was in 2009 when a
video camera capable of capturing 20 f-stops (1920 1080 resolution) at
30 frames a second was shown at the annual ACM SIGGRAPH conference
by the German high-precision camera manufacturer Spheron VR and the
International Digital Laboratory at the University of Warwick, UK.
High dynamic range (HDR) imaging is the term given to the capture,
storage, manipulation, transmission, and display of images that more accurately represent the wide range of real-world lighting levels. With the
advent of a true HDR video system, and from the experience of more
than 20 years of static HDR imagery, HDR is nally ready to enter the
mainstream of imaging technology. The aim of this book is to provide
a comprehensive practical guide to facilitate the widespread adoption of
HDR technology. By examining the key problems associated with HDR
imaging and providing detailed methods to overcome these problems, together with supporting Matlab code, we hope readers will be inspired to
adopt HDR as their preferred approach for imaging the real world.

xiii

xiv

Preface

Advanced High Dynamic Range Imaging covers all aspects of HDR imaging from capture to display, including an evaluation of just how closely the
results of HDR processes are able to recreate the real world. The book
is divided into seven chapters. Chapter 1 introduces the basic concepts.
This includes details on the way a human eye sees the world and how this
may be represented on a computer. Chapter 2 sets the scene for HDR
imaging by describing the HDR pipeline and all that is necessary to capture real-world lighting and then subsequently display it. Chapters 3 and 4
investigate the relationship between HDR and low dynamic range (LDR)
content and displays. The numerous tone mapping techniques that have
been proposed over more than 20 years are described in detail in Chapter 3. These techniques tackle the problem of displaying HDR content in
a desirable manner on LDR displays. In Chapter 4, expansion operators,
generally referred to as inverse (or reverse) tone mappers (iTMOs), are
considered part of the opposite problem: how to expand LDR content for
display on HDR devices. A major application of HDR technology, image
based lighting (IBL), is considered in Chapter 5. This computer graphics
approach enables real and virtual objects to be relit by HDR lighting that
has been previously captured. So, for example, the CAD model of a car
may be lit by lighting previously captured in China to allow a car designer
to consider how a particular paint scheme may appear in that country.
Correctly applied IBL can thus allow such hypothesis testing without the
need to take a physical car to China. Another example could be actors
being lit accurately as if they were in places they have never been. Many
tone mapping and expansion operators have been proposed over the years.
Several of these attempt to create as accurate a representation of the real
world as possible within the constraints of the LDR display or content.
Chapter 6 discusses methods that have been proposed to evaluate just how
successful tone mappers have been in displaying HDR content on LDR devices and how successful expansion methods have been in generating HDR
images from legacy LDR content. Capturing real-world lighting generates
a large amount of data. The HDR video camera shown at SIGGRAPH
requires 24 MB per frame, which equates to almost 42 GB for a minute
of footage (compared with just 9 GB for a minute of LDR video). The nal chapter of Advanced High Dynamic Range Imaging examines the issues
of compressing HDR imagery to enable it to be manageable for storage,
transmission, and manipulation and thus practical on existing systems.

Introduction to MATLAB
Matlab is a powerful numerical computing environment. Created in the
late 1970s and subsequently commercialized by The MathWorks, Matlab
is now widely used across both academia and industry. The interactive

Preface

xv

nature of Matlab allows it to rapidly demonstrate many algorithms in


an intuitive manner. It is for this reason we have chosen to include the
key HDR algorithms as Matlab code as part of what we term the HDR
Toolbox. An overview of the HDR Toolbox is given in Appendix C. In
Advanced High Dynamic Range Imaging, the common parts of Matlab
code are presented at the beginning of each chapter. The remaining code
for each technique is then presented at the point in the chapter where the
technique is described. The code always starts with the input parameters
that the specic method requires.
For example, in Listing 1, the code segment for the Schlick tone mapping
operator, the method takes the following parameters as input: schlick
mode species the type of model of the Schlick technique used. We may

if (~ exist ( schlick_mode ) |~ exist ( schlick_p ) |~ exist (


schlick_bit ) |~ exist ( schlick_dL0 ) |~ exist ( schlick_k ) )
s c h l i c k _ m o d e = standard ;
schlick_p =1/0.005;
end
% Max Luminance value
LMax = max ( max ( L ) ) ;
% Min Luminance value
LMin = min ( min ( L ) ) ;
if ( LMin <=0.0)
ind = find ( LMin >0.0) ;
LMin = min ( min ( L ( ind ) ) ) ;
end
% Mode selection
switch s c h l i c k _ m o d e
case standard
p = schlick_p ;
if (p <1)
p =1;
end
case calib
p = schlick_d L0 * LMax /(2^ schlick_b it * LMin ) ;
case nonuniform
p = schlick_d L0 * LMax /(2^ schlick_b it * LMin ) ;
p = p *(1 - schlick_k + schlick_k * L / sqrt ( LMax * LMin ) ) ;
end
% Dynamic Range Reduction
Ld = p .* L ./(( p -1) .* L + LMax ) ;

Listing 1. Matlab Code: Schlick TMO [189].

xvi

Preface

have three cases: standard, calib, and nonuniform modes. The standard
mode takes the parameter p as input from the user, while the calib and
nonuniform modes are using the uniform and nonuniform quantization
techniques, respectively. The variable schlick p is the parameter p or
p depending on the mode used, schlick bit is the number of bits N
of the output display, schlick dL0 is the parameter L0 , and schlick k
is the parameter k. The rst step is to extract the luminance channel
from the image and the maximum, L Max, and the minimum luminance,
L Min. These values can be used for calculating p. Afterwards, based on
the selection mode, one of the three modalities is chosen and the parameter
p either is given by the user (standard mode) or is equal to Equation (3.9)
or to Equation (3.10). Finally, the dynamic range of the luminance channel
is reduced by applying Equation (3.8).

Acknowledgements
Many people provided help and support during my doctoral research and
the writing of this book. Special thanks go to the wonderful colleagues,
sta, and professors I met during this time in Warwick and Bristol: Patrick,
Kurt, Alessandro, Alan, Karol, Kadi, Luis Paulo, Sumanta, Piotr, Roger,
Matt, Anna, Cathy, Yusef, Usama, Dave, Gav, Veronica, Timo, Alexa, Marina, Diego, Tom, Jassim, Carlo, Elena, Alena, Belma, Selma, Jasminka,
Vedad, Remi, Elmedin, Vibhor, Silvester, Gabriela, Nick, Mike, Giannis,
Keith, Sandro, Georgina, Leigh, John, Paul, Mark, Joe, Gavin, Maximino,
Alexandrino, Tim, Polly, Steve, Simon, and Michael. The VCG Laboratory ISTI-CNR generously gave me time to write and were supportive
colleagues.
I am heavy with debt for the support I have received from my family.
My parents, Maria Luisa and Renzo; my brother Piero and his wife, Irina;
and my brother Paolo and his wife, Elisa. Finally, for her patience, good
humor, and love during the writing of this book, I thank Silvia.
Francesco Banterle

This book started many years ago when I decided to move from Color
Science to Computer Graphics. Thanks to this event, I had the opportunity to move to Vienna and chose to work in the HDR eld. I am very
grateful to Werner Purgathofer who gave me the possibility to work and
start my PhD at the Vienna University of Technology and also the chance
to know Meister Eduard Groeller. I am grateful to my coauthors: Alan
Chalmers gave me the opportunity to share with him this adventure that
started in a taxi driving back from the airport during one of our business

Preface

xvii

trips; also, we have shared the foundation of goHDR, which has been another important activity, and we start progressively to see the results day
by day. Kurt Debattista and Francesco Banterle are two excellent men
of science, and from them I have learned many things. At the Warwick
Digital Laboratory, I have had the possibility to share several professional
moments with young researchers; thanks to Vedad, Carlo, Jass, Tom, Piotr, Alena, Silvester, Vibhor, and Elmedin as well as many collaborators
such as Sumanta N. Pattanaik, Mateu Sbert, Karol Myszkowski, Attila and
Laszlo Neumann, and Yiorgos Chrusanthou. I would like to thank with all
my heart my mother, Franca, and grandmother Nella, who are always in
my mind. Grateful thanks to my father, Sincero, and brothers, Marco and
Giancarlo, as well as my ance, Despo; they have always supported my
work. Every line of this book, and every second I spent in writing it, is
dedicated to all of them.
Alessandro Artusi
First, I am very grateful to the three coauthors whose hard work has made
this book possible. I would like to thank my PhD students who are always willing to help and oer good, sound technical advice: Vibhor Aggarwal, Tom Bashford-Rogers, Keith Bugeja, Piotr Dubla, Sandro Spina, and
Elmedin Selmanovic. I would also like to thank the following colleagues,
many of whom have been an inspiration and with whom it has been a
pleasure working over the past few years at Bristol and Warwick: Matt
Aranha, Kadi Bouatouch, Kirsten Cater, Joe Cordina, Gabriela Czanner,
Silvester Czanner, Sara de Freitas, Gavin Ellis, Jassim Happa, Carlo Harvey, Vedad Hulusic, Richard Gillibrand, Patrick Ledda, Pete Longhurst,
Fotis Liarokapis, Cheng-Hung (Roger) Lo, Georgia Mastoropoulou, Antonis Petroutsos, Alberto Proenca, Belma Ramic-Brkic, Selma Rizvic, Luis
Paulo Santos, Simon Scarle, Veronica Sundstedt, Kevin Vella, Greg Ward,
and Xiaohui (Cathy) Yang. My parents have always supported me and I
will be eternally grateful. My grandparents were an inspiration and are
sorely missedthey will never be forgotten. Finally, I would like to wholeheartedly thank my wife, Anna, for her love and support and Alex, who
has made our lives complete.
Kurt Debattista
This book has come about after many years of research in the eld and
working with a number of outstanding post-docs and PhD students, three
of whom are coauthors of this book. I am very grateful to all of them for
their hard work over the years. This research has built on the work of
the pioneers, such as Holly Rushmeier, Paul Debevec, Jack Tumblin, Helge

xviii

Preface

Seetzen, Gerhard Bonnet, and Greg Ward; together with the growing body
of work from around the world, it has taken HDR from a niche research
area into general use. HDR now stands at the cusp of a step change in
media technology, analogous to the change from black and white to color.
In the not-too-distant future, capturing and displaying real-world lighting
will be the norm, with an HDR television in every home. Many exciting
new research and commercial opportunities will present themselves, with
new companies appearing, such as our own goHDR, as the world embraces
HDR en masse. In addition to all my groups over the years, I would like to
thank Professor Lord Battacharrya and WMG, University of Warwick for
having the foresight to establish Visualisation as one of the key research
areas within their new Digital Laboratory. Together with Advantage West
Midlands, they provided the opportunity that led to the development, with
Spheron VR, of the worlds rst true HDR video camera. Christopher Moir,
Ederyn Williams, Mike Atkins, Richard Jephcott, Keith Bowen FRS, and
Huw Bowen share the vision of goHDR, and their enthusiasm and experience are making this a success. I would also like to thank the Eurographics
Rendering Symposium and SCCG communities, which are such valuable
venues for developing research ideas, in particular Andrej Ferko, Karol
Myszkowski, Kadi Bouatouch, Max Bessa, Luis Paulo dos Santos, Michi
Wimmer, Anders Ynnerman, Jonas Unger, and Alex Wilkie. Finally, thank
you to Eva, Erika, Andrea, and Thomas for all their love and support.
Alan Chalmers

1
Introduction

The computer graphics and related industries, in particular those involved


with lms, games, simulation, virtual reality, and military applications,
continue to demand more realistic images displayed on a computer, that
is, synthesized images that more accurately match the real scene they are
intended to represent. This is particularly challenging when considering images of the natural world that present our visual system with a wide range
of colors and intensities. A starlit night has an average luminance level of
around 103 cd/m2 , and daylight scenes are close to 106 cd/m2 . Humans
can see detail in regions that vary by 1:104 at any given eye adaptation
level. With the possible exception of cinema, there has been little push
for achieving greater dynamic range in the image capture stage, because
common displays and viewing environments limit the range of what can be
presented to about two orders of magnitude between minimum and maximum luminance. A well-designed cathode ray tube (CRT) monitor may
do slightly better than this in a darkened room, but the maximum display
luminance is only around 100 cd/m2 , and in the case of LCD display the
maximum luminance may reach 300400 cd/m2 , which does not even begin
to approach daylight levels. A high-quality xenon lm projector may get
a few times brighter than this, but it is still two orders of magnitude away
from the optimal light level for human acuity and color perception. This is
now all changing with high dynamic range (HDR) imagery and novel capture and display HDR technologies, oering a step-change in traditional
imaging approaches.
In the last two decades, HDR imaging has revolutionized the eld of
computer graphics and other areas such as photography, virtual reality,
visual eects, and the video game industry. Real-world lighting can now
be captured, stored, transmitted, and fully utilized for various applications

1. Introduction

(a)

(b)

(c)

Figure 1.1. Dierent exposures of the same scene that allow the capture of
(a) very bright and (b) dark areas and (c) the corresponding HDR image in
false colors.

without the need to linearize the signal and deal with clamped values. The
very dark and bright areas of a scene can be recorded at the same time onto
an image or a video, avoiding under-exposed and over-exposed areas (see
Figure 1.1). Traditional imaging methods, on the other hand, do not use
physical values and typically are constrained by limitations in technology
that could only handle 8 bits per color channel per pixel. Such imagery
(8 bits or less per color channel) is known as low dynamic range (LDR)
imagery.
The importance of recording light is comparable to the introduction of
color photography. An HDR image may be generated by capturing multiple
images of the same scene at dierent exposure levels and merging them to
reconstruct the original dynamic range of the captured scene. There are
several algorithms for merging LDR images; Debevec and Maliks method
[50] is an example of this. An example of a commercial implementation is
the Spheron HDR VR [192] that can capture still spherical images with a
dynamic range of 6 107 : 1. Although information could be recorded in
one shot using native HDR CCDs, problems of low sensor noise typically
occur at high resolution.
HDR images/videos may occupy four times the amount of memory required by corresponding LDR image content. This is because in HDR
images, light values are stored using three oating point numbers. This
has a major eect not only on storing and transmitting HDR data but
also in terms of processing it. As a consequence, ecient representations
of the oating point numbers have been developed for HDR imaging, and
many classic compression algorithms such as JPEG and MPEG have been
extended to handle HDR images and videos.
Once HDR content has been eciently captured and stored, it can be
utilized for a variety of applications. One popular application is the relighting of synthetic or real objects. The HDR data stores detailed lighting
information of an environment. This information can be exploited for de-

1. Introduction

8.8e+00
5.2e+00
2.9e+00
1.5e+00
5.8e01

Lux

(a)

(b)

(c)

Figure 1.2. A relighting example. (a) A spherical HDR image in false color.
(b) Light sources extracted from it. (c) A relit Stanfords Happy Buddha model
[78] using those extracted light sources.

tecting light sources and using them for relighting objects (see Figure 1.2).
Such relighting is very useful in many elds such as augmented reality, visual eects, and computer graphics. This is because the appearance of the
image is transferred onto the relit objects.
Another important application is to capture samples of the bidirectional reectance distribution function (BRDF), which describes how light
interacts with a given material. These samples can be used to reconstruct the BRDF. HDR data is required for an accurate reconstruction (see

(a)

(b)

Figure 1.3. An example of capturing samples of a BRDF. (a) A tone mapped


HDR image showing a sample of the BRDF from a Parthenons block [199].
(b) The reconstructed materials in (a) from 80 samples for each of three exposures. (Images are courtesy of Paul Debevec [199].)

1. Introduction

(a)

(b)

Figure 1.4. An example of HDR visualization on an LDR monitor. (a) An HDR


image in false color. (b) The image in (a) has been processed to visualize details
in bright and dark areas. This process is called tone mapping.

Figure 1.3). Moreover, all elds that use LDR imaging can benet from
HDR imaging. For example, disparity calculations in computer vision can
be improved in challenging scenes with bright light sources. This is because
information in the light sources is not clamped; therefore, disparity can be
computed for light sources and reective objects with higher precision than
using clamped values.
Once HDR content is obtained, it needs to be visualized. HDR images/videos do not typically t the dynamic range of classic LDR displays
such as CRT or LCD monitors, which is around 200 : 1. Therefore, when
using such displays, the HDR content has to be processed by compressing
the dynamic range. This operation is called tone mapping (see Figure 1.4).
Recently, monitors that can natively visualize HDR content have been proposed by Seetzeen et al. [190] and are now starting to appear commercially.

1.1

Light, Human Vision, and Color Spaces

This section introduces basic concepts of visible light and units for measuring it, the human visual system (HVS) focusing on the eye, and color spaces.
These concepts are very important in HDR imaging as they encapsulate
the physical-real values of light, from very dark values (i.e., 103 cd/m2 )
to very bright ones (i.e., 106 cd/m2 ). Moreover, the perception of a scene
by the HVS depends greatly on the lighting conditions.

1.1.1 Light
Visible light is a form of radiant energy that travels in space, interacting
with materials where it can be absorbed, refracted, reected, and trans-

1.1. Light, Human Vision, and Color Spaces

(b)

(a)

(c)

Figure 1.5. (a) The three main light interactions: transmission, absorption, and
reection. In transmission, light travels through the material, changing its direction according to the physical properties of the medium. In absorption, the
light is taken up by the material that was hit and it is converted into thermal
energy. In reections, light bounces from the material in a dierent direction due
to the materials properties. There are two main kinds of reections: specular
and diuse. (b) Specular reections: a ray is reected in a particular direction.
(c) Diuse reections: a ray is reected in a random direction.

mitted (see Figure 1.5). Traveling light can reach human eyes, stimulating
them to produce visual sensations depending on the wavelength (see Figure 1.6).
Radiometry and Photometry dene how to measure light and its units
over time, space, and direction. While the former measures physical units,
the latter takes into account the human eye, where spectral values are
weighted by the spectral responses of a standard observer (x, y and z
curves). Radiometry and Photometry units were standardized by the Commission Internationale de lEclairage (CIE) [38]. The main radiometric
units are:
Radiant energy (e ). This is the basic unit for light. It is measured
in joules (J).
Radiant power (Pe = dte ). Radiant Power is the amount of energy
that ows per unit of time. It is measured in watts (W); W = J s1.
e
Radiant intensity (Ie = dP
d ). This is the amount of Radiant Power
per unit of direction. It is measured in watts per steradian (W
sr1 ).

dPe
). Irradiance is the amount of Radiant Power
Irradiance (Ee = dA
e
per unit of area from all directions of the hemisphere at a point. It
is measured in watts per square meters (W m2 ).

1. Introduction

Figure 1.6. The electromagnetic spectrum. The visible light has a very limited
spectrum between 400 nm and 700 nm.
2

Pe
Radiance (Le = dAedcos
d ). Radiance is the amount of Radiant
Power arriving/leaving at a point in a particular direction. It is
measured in watts per steradian per square meter (W sr1 m2 ).

The main photometric units are:


Luminous power (Pv ). Luminous Power is the weighted Radiant
Power. It is measured in lumens (lm), a derived unit from candela
(lm = cd sr).
Luminous energy (Qv ). This is analogous to the Radiant Energy. It
is measured in lumens per second (lm s1 ).
Luminous intensity (Iv ). This is the Luminous Power per direction.
It is measured in candela (cd), which is equivalent to lm sr1 .
Illuminance (Ev ). Illuminance is analogous to Irradiance. It is measured in lux, which is equivalent to lm m2 .
Luminance (Lv ). Luminance is the weighted Radiance. It is measured
in cd m2 equivalent to lm m2 sr1 .
A measure of the relative luminance of the scene can be useful, since it
can illustrate some properties of the scene such as the presence of diuse
or specular surfaces, lighting condition, etc. For example, specular surfaces
reect light sources even if they are not visible directly in the scene, increasing the relative luminance. This relative measure is called contrast.
Contrast is formally a relationship between the darkest and the brightest
value in a scene, and it can be calculated in dierent ways. The main contrast relationships are Weber Contrast (CW ), Michelson Contrast (CM ),
and Ratio Contrast (CR ). These are dened as

CW =

Lmax Lmin
,
Lmin

CM =

Lmax Lmin
,
Lmin + Lmin

CR =

Lmax
,
Lmin

1.1. Light, Human Vision, and Color Spaces

where Lmin and Lmax are respectively the minimum and maximum luminance values of the scene. Throughout this book CR is used as contrast
denition.

1.1.2 An Introduction to the Human Eye


The eye is an organ that gathers light onto photoreceptors, which then
convert light into signals (see Figure 1.7). These are transmitted through
the optical nerve to the visual cortex, an area of the brain that processes
these signals producing the perceived image. This full system, which is
responsible for vision, is referred to as the human visual system (HVS) [140].
Light, which enters the eye, rst passes through the cornea, a transparent membrane. Then it enters the pupil, an aperture that is modied by
the iris, a muscular diaphragm. Subsequently, light is refracted by the lens
and hits photoreceptors in the retina. Note that inside the eye there are two
liquids, the vitreous and aqueous humors. The former lls the eye, keeping
its shape and the retina against the inner wall. The latter is between the
cornea and the lens and maintains the intraocular pressure [140].
There are two types of photoreceptors: cones and rods. The cones,
numbering around 6 million, are located mostly in the fovea. They are
sensitive at luminance levels between 102 cd/m2 and 108 cd/m2 (photopic
vision or daylight vision), and they are responsible for the perception of high
frequency pattern, fast motion, and colors. Furthermore, color vision is due
to three types of cones: short wavelength cones, sensitive to wavelengths
around 435 nm; middle wavelength cones, sensitive around 530 nm; and
long wavelength cones, sensitive around 580 nm. The rods, numbering
around 90 million, are sensitive at luminance levels between 106 cd/m2
and 10 cd/m2 (scotopic vision or night vision). The rods are more sensitive

Figure 1.7. The human eye.

1. Introduction

than cones but do not provide color vision. This is the reason why we are
unable to discriminate between colors at low level illumination conditions.
There is only one type of rod and it is located around the fovea but is
absent in it. This is why high frequency patterns cannot be distinguished
at low lighting conditions. The Mesopic range, where both rods and cones
are active, is dened between 102 cd/m2 and 10 cd/m2 . Note that an
adaptation time is needed for passing from photopic to scotopic vision
and vice versa; for more details, see [140]. The rods and cones compress
the original signal, reducing the dynamic range of incoming light. This
compression follows a sigmoid function:
R
In
= n
,
Rmax
I + n
where R is the photoreceptor response, Rmax is the maximum photoreceptor response, and I is the light intensity. The variables and n are
respectively the semisaturation constant and the sensitivity control exponent, which are dierent for cones and rods [140].

1.1.3 Color Spaces


A color space is a mathematical description for representing colors, typically represented by three components called primary colors. There are
two classes of color spaces: device dependent and device independent. The
former describes the color information in relation to the technology used by
the color device to reproduce the color. In the case of a computer monitor
it depends on the set of primary phosphors, while in an ink-jet printer it
depends on the set of primary inks. A drawback of this representation is
that a color with the same coordinates such as R = 150, G = 40, B = 180
will appear dierent when represented on dierent monitors. The device
independent class is not dependent on the characteristics of a particular
device; in this way a color represented in such a color space always corresponds to the same color information. A typical device-dependent color
space is the RGB color space. The RGB color space is a Cartesian cube
represented by three additive primaries: Red, Green, and Blue. A typical
independent color space is the CIE 1931 XYZ color space, which is formally dened as the projection of a spectral power distribution I into the
color-matching functions, x, y, and z:
 830
 830
 830
I()x()d, Y =
I()y()d, Z =
I()z()d.
X=
380

380

380

The functions x, y, and z are plotted in Figure 1.8. Note that the XYZ
color space was designed in such a way that the Y component measures
the luminance of the color. The chromaticity of the color is derived from

1.1. Light, Human Vision, and Color Spaces

x
y
z

1.8
1.6

0.8
0.7

1.4

0.6

1.2

1
0.8

0.5
0.4

0.4

0.2

0.2

0.1

0
350

D65

0.3

0.6

400

450

500

550

(nm)

600

650

0.0
0.0

700

0.1

0.2

(a)

0.3

0.4

0.5

0.6

0.7

(b)

Figure 1.8. The CIE XYZ color space. (a) The CIE 1931 two-degree XYZ color
matching functions. (b) The CIE xy chromaticity diagram showing all colors
that HVS can perceive. Note that the triangle is the space of color that can be
represented in sRGB, where the three circles represent the three primaries.

XYZ values as
x=

X
,
X +Y +Z

y=

Y
.
X +Y +Z

These values can be plotted, producing the so called CIE xy chromaticity diagram. This diagram shows all colors perceivable by the HVS (see
Figure 1.8(b)).
A popular color space for CRT and LCD monitors is sRGB [195]. This
color space denes as primaries the colors red (R), green (G), and blue (B).
Moreover, each color in sRGB is a linear additive combination of values in
[0, 1] of the three primaries. Therefore, not all colors can be represented,
only those inside the triangle generated by the three primaries (see Figure 1.8(b)).
A linear relationship exists between the XYZ and RGB color spaces.
RGB colors can be converted into XYZ ones using the following conversion
matrix M:



0.412 0.358 0.181
R
X
Y = M G ,
M = 0.213 0.715 0.072 .
0.019 0.119 0.950
Z
B
Furthermore, sRGB presents a nonlinear transformation for each R, G,
and B channel to linearize the signal when displayed on LCD and CRT
monitors. This is because there is a nonlinear relationship between the

10

1. Introduction

Symbol
Lw
Ld
LH
Lavg
Lmax
Lmin

Description
HDR luminance value
LDR luminance value
Logarithmic mean luminance value
Arithmetic mean luminance value
Maximum luminance value
Minimum luminance value

Table 1.1. The main symbols used for the luminance channel in HDR image
processing.

output intensity generated by the displaying device and the input voltage.
This relationship is generally approximated with a power function with
value = 2.2 (in case of sRGB, = 2.4). The linearization is achieved by
applying the inverse value:

1
Rv
R
Gv = G ,
Bv
B
where Rv , Gv , Bv are respectively the red, green, and blue channels ready
for visualization. This process is called gamma correction.
The RGB color space is very popular in HDR imaging. However, many
computations are calculated in the luminance channel Y from XYZ, which
is usually referred to as L. In addition, common statistics from this luminance are often used, such as the maximum value, Lmax , the minimum
one, Lmin , and the mean value. This can be computed as the arithmetic
average, Lavg or the logarithmic one, LH :
Lavg =

N
1 
L(xi ),
N i=1

 

N

1
LH = exp
log L(xi ) +  ,
N i=1

where xi are the coordinates of the ith pixel, and  > 0 is a small constant
for avoiding singularities. Note that in HDR imaging, subscripts w and d
(representing world luminance and display luminance, respectively) refer to
HDR and LDR values. The main symbols used in HDR image processing
are shown in Table 1.1 for the luminance channel L.

2
HDR Pipeline

HDR imaging is a revolution in the eld of imaging allowing, as it does,


the ability to use and manipulate physically-real light values. This chapter
introduces the main processes of HDR imaging, which can be best characterized as a pipeline, termed the HDR pipeline. Figure 2.1 illustrates the
distinct stages of the HDR pipeline.
The rst stage concerns the generation of HDR content. HDR content
can be captured in a number of ways, although limitations in hardware
technology, until recently, have meant that HDR content capture has typically required the assistance of software. Section 2.1 outlines dierent ways
in which HDR images can be generated. These include images generated
from a series of still LDR images, using computer graphics, and via expansion from single-exposure images. The section also describes exciting new
hardware that enables native HDR capture.
Due to the explicit nature of high dynamic range values, HDR content
may be considerably larger than its LDR counterpart. To make HDR
manageable, ecient storage methods are necessary. In Section 2.2 HDR
le formats are introduced. Compression methods can also be applied
at this stage. HDR compression methods will be discussed in detail in
Chapter 7.
Finally, HDR content can be natively visualized using a number of new
display technologies. In Section 2.3.2 we introduce the primary native
HDR displays. Such displays are still generally unavailable to the consumer. However, software solutions can be employed to adapt HDR content to be shown on LDR displays while attempting to maintain an HDR
viewing experience. Such software solutions take the form of operators
that compress the range of luminance in the HDR images to the luminance range of the LDR display. These operators are termed tone mappers

11

12

2. HDR Pipeline

Figure 2.1. The HDR pipeline in all its stages. Multiple exposure images are
captured and combined, obtaining an HDR image. Then this image is quantized,
compressed, and stored. Further processing can be applied to the image. For
example, areas of high luminance can be extracted and used to relight a synthetic
object. Finally, the HDR image or a tone mapped HDR image can be visualized
using native HDR monitors or traditional LDR display technologies.

and a large variety of tone mapping operators exist. We will discuss tone
mapping in detail in Chapter 3.

2.1

HDR Content Generation

In this book we will consider four methods of generating HDR content.


The rst, and most commonly used until recently, is the generation of
HDR content by combining a number of LDR captures at dierent exposures through the use of software technology. The second, which is likely
to become more feasible in the near future, is the direct capture of HDR
images using specialized hardware. The third method, popular in the entertainment industries, is the creation of HDR content from virtual environments using physically based renderers. The nal method is the generation
of HDR content from legacy content consisting of single exposure captures, using software technology to expand the dynamic range of the LDR
content.

2.1.1 Generating HDR Content by Combining Multiple Exposures


At the time of writing, available consumer cameras are limited since they
can only capture 8-bit images or 12-bit images in RAW format. This does

2.1. HDR Content Generation

13

2.2e+02

7.5e+01

2.5e+01

7.7e+00

2.0e+00

Lux

(a)

(b)

(c)

(d)

(e)

(f)

Figure 2.2. An example of HDR capturing of the Stanford Memorial Church.


1
1
sec. (b) 30
sec. (c) 14 sec.
Images taken with dierent shutter speeds. (a) 250
(d) 2 sec. (e) 8 sec. The HDR image is obtained by combining (a), (b), (c), (d),
and (e). (f) A false color rendering of the luminance channel of the obtained
HDR image. (The original HDR image is courtesy of Paul Debevec [50].)

not cover the full dynamic range of irradiance values in most environments
in the real world. The most commonly used method of capturing HDR
images is to take multiple single-exposure images of the same scene to
capture details from the darkest to the brightest areas as proposed by
Mann and Picard [131] (see Figure 2.2 for an example). If the camera has
a linear response, the radiance values stored in each exposure for each color
channel can be combined to recover the irradiance, E, as
Ne
E(x) =

1
i=1 ti w(Ii (x))Ii (x)
,
N
i=1 w(Ii (x))

(2.1)

where Ii is the image at the ith exposure, ti is the exposure time for
Ii , Ne is the number of images at dierent exposures, and w(Ii (x)) is a
weighting function that removes outliers. For example, high values in one
of the exposures will have less noise than low values. On the other hand,
high values can be saturated, so middle values can be more reliable. An
example of a recovered irradiance map using Equation (2.1) can be seen in
Figure 2.2(f).
Unfortunately, lm and digital cameras do not have a linear response
but a more general function f , called the camera response function (CRF).
The CRF attempts to compress as much of the dynamic range of the real
world as possible into the limited 8-bit storage or into lm medium. Mann
and Picard [131] proposed a simple method for calculating f , which consists
of tting the values of pixels at dierent exposure to a xed f (x) = ax + b.
This parametric f is very limited and does not support most real CRFs.
Debevec and Malik [50] proposed a simple method for recovering a CRF.
For the sake of clarity this method and others will be presented for gray
channel images. The value of a pixel in an image is given by the application

14

2. HDR Pipeline

of a CRF to the irradiance scaled by the exposure time:


I(x) = f (E(x)ti ).
Rearranging terms and applying a logarithm to both sides we obtain
log(f 1 (I(x))) = log Ei (x) + log ti .

(2.2)

Assuming that f is a smooth and monotonically increasing function, f


and E can be calculated by minimizing the least square error derived from
Equation (2.2) using pixels from images at dierent exposures:

Ne 
M 


 2


O=
w Ii (xj ) g(Ii (xj )) log E(xj ) log ti
i=1 j=1

Tmax
1

(w(x)g  (x))2 ,

(2.3)

x=Tmin +1

where g = f 1 is the inverse of the CRF, M is the number of pixels used


in the minimization, and Tmax and Tmin are respectively the maximum and
minimum integer values in all images Ii . The second part of Equation (2.3)
is a smoothing term for removing noise, where function w is dened as

x Tmin if x 12 (Tmax + Tmin ),
w(x) =
Tmax x if x > 12 (Tmax + Tmin ).
Note that minimization is performed only on a subset of the M pixels,
because it is computationally expensive to evaluate for all pixels. This
subset is calculated using samples from each region of the image.
function imgHDR = BuildHDR ( format , lin_type , weightFun )
% is a weight function defined ?
if (~ exist ( weightFun ) )
weightFun = all ;
end
% is the l i n e a r i z a t i o n type of the images defined ?
if (~ exist ( lin_type ) )
lin_type = gamma2 .2 ;
end
% Read images from the current directory
[ stack , e x p o s u r e _ s t a c k ] = R e a d L D R S t a c k ( format ) ;
% Calculat i on of the CRF
lin_fun = [];

2.1. HDR Content Generation

15

switch lin_type
case tabledDeb97
% Weight function
W = W e i g h t F u n c t i o n (0:1/255:1 , weightFun ) ;
% Convert the stack into a smaller stack
stack2 = StackLowR e s ( stack ) ;
% L i n e a r i z a t i o n process using Debevec and Malik 1998 s
method
lin_fun = zeros (256 ,3) ;
for i =1:3
g = gsolve ( stack2 (: ,: , i ) , exposure_stack ,10 , W ) ;
lin_fun (: , i ) =( g / max ( g ) ) ;
end
otherwise
end
% Combine different exposure using l i n e a r i z a t i o n function
imgHDR = CombineLD R ( stack , exp ( e x p o s u r e _ s t a c k ) +1 , lin_type ,
lin_fun , weightFun ) ;
end

Listing 2.1. Matlab Code: Combining multiple LDR exposures.

Listing 2.1 shows Matlab code for combining multiple LDR exposures
into a single HDR. The full code is given in the le BuildHDR.m. The
function accepts as input format, an LDR format for reading LDR images. The second parameter lin type outlines the linearization method
to be used, where possible options are linearized for no linearization
(for images that are already linearized on input), gamma2.2 for applying gamma function of 2.2, and tabledDeb97, which would employ the
Debevec and Malik method described above. Finally, the type of weight
weight type can also be input. The resulting HDR image is output. After
handling the input parameters, the function ReadLDRStack inputs the images from the current directory. The code block in the case statement case
tabledDeb97 handles the linearization using the Debevec and Maliks
method outlined previously. Finally, CombineLDR.m combines the stack
using the appropriate weighting function.
Mitsunaga and Nayar [149] improved Debevec and Maliks algorithm
with a more robust method based on a polynomial representation of f .
They claim that any response function can be modeled using a high-order
polynomial:
Ii (x) = f (E(x)ti ) =

P


ck (E(x)ti )k .

k=0

At this point the calibration process can be reduced to the estimation


of the polynomial order P and the coecients cj . Taking two images of a

16

2. HDR Pipeline

scene with two dierent exposure times t1 and t2 , the ratio R can be
written as
t1
I1 (x)
.
(2.4)
R=
=
t2
I2 (x)
The brightness measurement Ii (x) produced by an imaging system is
related to scene radiance E(xti ) at time i via a response function Ii (x) =
f (E(xti )). From this, Ii (x) can be rewritten as E(xti ) = g(Ii (x)) where
g = f 1 . Since the response function of an imaging system is related to
the exposure ratio, the Equation (2.4) can be rewritten as
P
ck I1 (x)k
I1 (x)
R1,2 (x) =
= k=0
,
P
k
I2 (x)
k=0 ck I2 (x)

(2.5)

where the images are ordered in a way that t1 < t2 so as R (0, 1).
The number of f R pairs that satisfy the Equation (2.5) is innite. This
ambiguity is alleviated by the use of the polynomial model. The response
function can be recovered by formulating an error function such as
=

Ne 
M 
P

i=1 j=1

ck Ii (xj ) Ri,i+1 (xj )


k

k=0

P


2
k

ck Ii+1 (xj )

k=0

where all measurements can be normalized so as Ii (x) is in [0, 1]. An


additional constraint can be introduced if the indeterminable
scale can be
P 1
xed as f (1) = Imax , which follows cP = Imax k=0 ck . The coecients
of the response function are determined by solving a linear system setting:

= 0.
ck
To reduce searching, when the number of images is high (more than nine),
an iterative scheme is used. In this case, the current ratio at the kth step
is used to evaluate the coecients at the k + 1th step.
Robertson et al. [184, 185] proposed a method that estimates the unknown response function as well as the irradiance E(x) through the use
of the maximum likelihood approach, where the objective function to be
minimized is
O(I, E) =

Ne 
M


wi,j (Ii (xj ) ti E(xj ))2 ,

i=0 j=0

where w is a weight dened by a Gaussian function, which represents the


noise in the imaging system used to capture the images. Note that all
the presented methods for recovering the CRF can be extended to colored
images applying each method separately for each color band.

2.1. HDR Content Generation

17

The multiple exposure methods assume that images are perfectly aligned,
there are no moving objects, and CCD noise is not a problem. These are
very rare conditions when real-world images are captured. These problems
can be minimized by adapting classic alignment, ghost, and noise removal
techniques from image processing and computer vision (see [12, 71, 94, 98]).
HDR videos can be captured using still images, with techniques such
as stop-motion or time-lapse. Under controlled conditions, these methods
may provide good results with the obvious limitations that stop-motion and
time-lapse entail. Kang et al. [96] extended the multiple exposure methods
used for images to be used for videos. Kang et al.s basic concept is to have a
programmed video camera that temporally varies the shutter speed at each
frame. The nal video is generated aligning and warping dierent frames,
combining two frames into an HDR one. However, the frame rate of this
method is lowaround 15 fpsand the scene can only contain slow-moving
objects; otherwise artifacts will appear. The method is thus not well suited
for real-world situations. Nayar and Branzoi [153] developed an adaptive
dynamic range camera where a controllable liquid crystal light modulator
is placed in front of the camera. This modulator adapts the exposure of
each pixel on the image detector, allowing the capture of scenes with a very
large dynamic range. Finally, another method for capturing HDR videos
is to capture multivideos at dierent exposures using several LDR video
cameras with a light beam splitter [9]. Recently, E3D Creative LLC applied
the beam splitters technique in the professional eld of cinematography
using a rig for stereo using two Red One video cameras [125]. This allows
one to capture high denition video streams in HDR.

2.1.2 Capturing HDR Content Natively


A few companies provide HDR cameras based on automatic multiple exposure capturing. The three main cameras are SpheronCam HDR by
SpheronVR [192], Panoscan MK-3 by Panoscan Ltd [164], and Civetta
360 by Weiss AG [229]. These are full 360-degree panoramic cameras with
high resolution. The cameras can capture full HDR images; see Table 2.1
for comparisons.
Device
Civetta
SpheronCam HDR
Panoscan MK-3

Dynamic Range
(f-stops)
30
26
11

Max. Resolution
(Pixels)
14144 7072
10624 5312
12000 6000

Max. Capturing
Time (Seconds)
40
1680 (28 min)
54

Table 2.1. A summary of the commercially available HDR spherical cameras.


Superscript  refers to data for a single pass, where three passes are needed to
obtain an HDR image.

18

2. HDR Pipeline

These cameras are rather expensive (on average more than $35, 000)
and designed for commercial use only. The development of these particular
cameras was mainly due to the necessity of quickly capturing HDR images
for use in image-based lighting (see Chapter 5), which is extensively used
in applications, including visual eects, computer graphics, automotive design, and product advertising. More recently, camera manufactures such
as Canon, Nikon, Sony, Sigma, etc. have introduced in consumer or DSLR
cameras some HDR capturing features such as multiexposure capturing or
automatic exposure bracketing and automatic exposure merging.
The alternative to multiple exposure techniques is to use CCD sensors
that can natively capture HDR values. In recent years, CCDs that record
into 10/12-bit channels in the logarithmic domain have been introduced
by many companies, such as Cypress Semiconductor [45], Omron [160],
PTGrey [176], and Neuricam [155]. The main problem with these sensors is
that they use low resolutions (640 480) and can be very noisy. Therefore,
their applications are mainly oriented towards security and automatization
in factories.
A number of companies have proposed high quality solutions for the entertainment industry. These are the Viper camera by Thomson GV [200];
Red One, Red Scarlet, and Red Epic camera by Red Digital Cinema Camera Company [179]; the Phantom HD camera by Vision Research [211]; and
Genesis by Panavision [163]. All these video cameras present high frame
rates, low noise, full HD (1920 1080) or 4K resolution (4096 3072), and

1.0e+02

7.3e+00

2.9e+00

1.3e+00

4.9e01

Lux

(a)

(b)

Figure 2.3. An example of a frame of the HDR video camera of Unger and
Gustavson [205]. (a) A false color image of the frame. (b) A tone mapped
version of (a).

2.1. HDR Content Generation

19

2.0e+01
5.5e+00
2.5e+00
1.2e+00
5.1e01

Lux

(a)

(b)

Figure 2.4. An example of a frame of the HDR video camera of SpheronVR. (a) A
false color image of the frame. (b) A tone mapped version of (a). (Image courtesy
of Jassim Happa and the Visualization Group, WMG, University of Warwick.)

a good dynamic range, 10/12/16-bit per channel in the logarithmic/linear


domain. However, they are extremely expensive and they do not capture
the full dynamic range that can be seen by the HVS at any one time.
In 2007, Unger and Gustavson [205] presented an HDR video camera for
research purposes (see Figure 2.3). It is capable of capturing high dynamic
range content at 512 896 resolution, 25 fps, and a dynamic range of
1,000,000 to 1. The main disadvantage is that the video camera uses three
separate CCD sensors, one for each of the three color primaries (RGB), and
it has the problem that for rapid scene motion, artifacts such as motion blur
may appear. In addition, due to the limitations of the internal antireex
coating in the lens, system are and glare artifacts can also appear.
In 2009, SpheronVR, in collaboration with the University of Warwick
[33], developed an HDR video camera capable of capturing high dynamic
range content at 19201080 resolution, 3050 fps, and a 20 f-stops dynamic
range (see Figure 2.4). The HDR video data stream is initially recorded on
an HDD array. A postprocessing engine transforms it to a sequence of HDR
les (typically OpenEXR), taking lens vignetting, spherical distortion, and
chromatic aberration into account.

2.1.3 The Generation of HDR Images


Computer graphics rendering methods are another common method of generating HDR content. Frequently, this can be augmented by photographic
methods.
Digital image synthesis is the process of rendering images from virtual scenes composed of formally dened geometric objects, materials, and
lighting, all captured from the perspective of a virtual camera. Two main
algorithms are usually employed for rendering: ray tracing and rasterization (see Figure 2.5).

20

2. HDR Pipeline

(a)

(b)

Figure 2.5. An example of the state of art of rendering quality for ray tracing and
rasterization. (a) A ray-traced image by Piero Banterle using Maxwell Render by
c
NextLimit Technologies [156]. (b) A screen shot from the game Crysis (2007
Crytek GmbH [44]).

Ray tracing. Ray tracing [232] models the geometric properties of light by
calculating the interactions of groups of photons, termed rays, with geometry. This technique can reproduce complex visual eects without much
modication to the traditional algorithm. Rays are shot from the virtual
camera and traverse the scene until the closest object is hit (see Figure 2.6).

Figure 2.6. Ray tracing. For each pixel in the image, a primary ray is shot
through the camera into the scene. As soon as it hits a primitive, the lighting for
the hit point is evaluated. This is achieved by shooting more rays. For example,
a ray towards the light is shot in the evaluation of lighting. A similar process is
repeated for reection, refractions, and interreections.

2.1. HDR Content Generation

21

Here the material properties of the object at that point are used to calculate
the illumination, and a ray is shot towards any light sources to account for
shadow visibility. The material properties at the intersection point further
dictate whether more rays need to be shot in the environment and in which
direction; the process is computed recursively. Due to its recursive nature,
ray tracing and extensions of the basic algorithm, such as path tracing
and distributed ray tracing, are naturally suited to solving the rendering
equation [95], which describes the transport of light within an environment.
Ray tracing methods can thus simulate eects such as shadows, reections,
refractions, indirect lighting, subsurface scattering, caustics, motion blur,
indirect lighting, and others in a straightforward manner.
While ray tracing is computationally expensive, recent algorithmic and
hardware advances are making it possible to compute it at interactive rates
for dynamic scenes [212].

Rasterization. Rasterization uses a dierent approach than ray tracing for


rendering. The main concept is to project each primitive of the scene
on the screen (frame buer) and discretize it into fragments, which are
then rasterized onto the nal image. When a primitive is projected and
discretized, visibility has to be solved to have a correct visualization and to
avoid incorrect overlap between objects. For this task, the Z-buer [32] is
generally used. The Z-buer is an image of the same size as the frame buer
that stores depth values of previous solved fragments. For each fragment at
a position x, its depth value, F (x)z , is tested against the stored one in the
Z-buer, Z(x)z . If F (x)z < Z(x)z , the new fragment is written in the frame
buer, and F (x)z is placed in the Z-buer. After the depth test, lighting
is evaluated for all fragments. However, shadows, reections, refractions,
and interreections cannot be handled natively with this process since rays
are not shot. These eects are often emulated by rendering the scene from
dierent positions. For example, shadows can be emulated by calculating
a Z-buer from the light source position and applying a depth test during
shading to determine if the point is in shadow or not. This method is
known as shadow mapping [234].
The main advantage of rasterization is that it is supported by current graphics hardware, which allows high performances in terms of drawn
primitives. Such performance is achieved since it is straightforward to parallelize rasterization: fragments are coherent and independent, and data
structures are easy to update. Finally, the whole process can be organized
into a pipeline. Nevertheless, the emulation of physically based light transport eects (i.e., shadows, reections/refractions, etc.) is not as accurate
as ray tracing and is biased in many cases. For more detail on rasterization,
see [10].

22

2. HDR Pipeline

2.1.4 Expanding Single-Exposure HDR


The emergence of HDR displays has focused aspects of HDR research on
expanding the many decades of legacy LDR content to take advantage of
these new displays. This is of particular relevance as the consumer release
of such displays becomes more imminent. Expansion methods attempt to
recreate the missing content for LDR content for which the HDR information was clamped, or severely compressed. A number of methods for
expanding LDR content have been proposed. We provide a comprehensive
overview of these methods in Chapter 4. Expansion methods have furthermore enabled the development of compression methods that use luminance
compression via tone mapping followed by an LDR compression stage for
encoding. The encoding stage uses the traditional LDR encoder followed
by an expansion stage that uses the inverse of the tone mapping operator
to expand the luminance. These methods are detailed in Chapter 7.

2.2

HDR Content Storing

Once HDR content is generated, there is the need to store, distribute, and
process these images. An uncompressed HDR pixel is represented using
three single precision oating point numbers [86], assuming three bands
for RGB colors. This means that a pixel uses 12 bytes of memory, and
at a high denition (HD) resolution of 1920 1080 a single image would
occupy approximately 24 MB. This is much larger than the approximately
6 MB required to store an equivalent LDR image without compression. Researchers have been working on ecient methods to store HDR content to
address the high memory demands. Initially, only compact representations
of oating point numbers were used for storing HDR. These methods are
still commonly in use in HDR applications and will be covered in this section. More recently, researchers have focused their eorts on compression
methods, which will be presented in Chapter 7.
HDR values are usually stored using single precision oating point numbers. Integer numbers, which are extensively used in LDR imaging, are not
practical for storing HDR values. For example, a 32-bit unsigned integer
can represent values in the range [0, 232 1], which seems to be enough
for most HDR content. However, this is not sucient to cover the entire
range experienced by the HVS. It also is not suitable when simple image
processing between two or more HDR images is carried out; for example,
when adding or multiplying, precision can be easily lost and overows may
occur. Such conditions make oating point numbers preferable to integer
ones for real-world values [86].
Using single precision oating point numbers, an image occupies 96 bits
per pixel (bpp). Ward [221] proposed the rst solution to this problem,

2.2. HDR Content Storing

23

RGBE, which was originally created for storing HDR values generated by
the radiance rendering system [223]. This method stores a shared exponent
between the three colors, assuming that it does not vary much between
them. The encoding of the format is defined as




E = log2 max(Rw , Gw , Bw ) + 128 ,






256Rw
256Gw
256Bw
Rm = E128 , Gm = E128 , Bm = E128 ,
2
2
2
and the decoding as
Rw =

Rm + 0.5 E128
Gm + 0.5 E128
Bm + 0.5 E128
2
2
2
, Gw =
, Bw =
.
256
256
256

Mantissas of the red, Rm , green, Gm , and blue, Bm , channels and the


exponent, E, are then each stored in an unsigned char (8-bit), achieving
a final format of 32 bpp. The RGBE encoding covers 76 orders of magnitude, but the encoding does not support the full gamut of colors and
negative values. To solve this, an image can be converted to the XYZ
color space before encoding. This case is referred to as the XYZE format.
Recently, the RGBE format has been implemented in graphics hardware
on the NVIDIA G80 series [99], allowing very fast encoding/decoding for
real-time applications.
function imgRGBE = float2RGB E ( img )
[m ,n , c ]= size ( img ) ;
imgRGBE = zeros (m ,n ,4) ;
v = max ( img ,[] ,3) ;
Low = find (v <1 e -32) ;
v2 = v ;
[v , e ] = log2 ( v ) ;
e = e +128;
v = v *256./ v2 ;
v ( Low ) =0;
e ( Low ) =0;
for i =1:3
imgRGBE (: ,: , i ) = round ( img (: ,: , i ) .* v ) ;
end
imgRGBE (: ,: ,4) = e ;
end

Listing 2.2. Matlab Code: RGBE encoding [221].

24

2. HDR Pipeline

function img = RGBE2float ( imgRGBE )


[m ,n , c ]= size ( imgRGBE ) ;
img = zeros (m , n ,3) ;
E = double ( imgRGBE (: ,: ,4) -128 -8) ;
f = pow2 (1.0 , E ) ;
f ( find ( imgRGBE (: ,: ,4) ==0) ) =0;
for i =1:3
img (: ,: , i ) = double ( imgRGBE (: ,: , i ) ) .* f ;
end
end

Listing 2.3. Matlab Code: RGBE decoding [221].

Listing 2.2 and Listing 2.3 show the Matlab code for encoding and decoding RGBE values from a natively stored HDR image (consisting of a
oat per color channel).
Ward proposed another HDR image format, a 24/32 bpp perceptually
based format, entitled LogLuv [111]. This image format assigns more bits
to luminance in the logarithmic domain than to colors in the linear domain.
Firstly, an image is converted to the Luv color space. The 32 bpp format
assigns 15 bits to luminance and 16 bits to chromaticity, and it is dened as


Le = 256(log2 Yw + 64) ,



ue = 410u ,



ve = 410v  ,

(2.6)

and the decoding as


Y = 2(Le +0.5)/25664 ,

u =


1
(ue ) ,
410

v =


1
(ve ) ,
410

(2.7)

where u and v  are the chromaticity coordinates. The 24 bpp format


changes slightly in the constants of Equation (2.6) and Equation (2.7)
according to the distribution of bits: 10 bits are allocated to luminance
and 14 bits to chromaticity. While the 32-bpp format covers 38 orders of
magnitude, the 24-bpp format covers only 4.8 orders of magnitude. The
main advantage of LogLuv over RGBE/XYZE is that the format stores
luminance and color information separately, making these values directly
usable for applications such as tone mapping (see Section 2.3.1), color
manipulation, etc.
The Matlab code for LogLuv encoding, based on Equation (2.6), is
shown in Listing 2.4. Similarly, the Matlab code for decoding, using
Equation (2.7), is given in Listing 2.5.

2.2. HDR Content Storing

function imgLogLuv = f l o a t 2 L o g L u v ( img )


% Conversio n from RGB to XYZ
imgXYZ = C o n v e r t R G B X Y Z ( img , 0) ;
imgLogLuv = zeros ( size ( img ) ) ;
% Encoding luminance Y
Le = floor (256*( log2 ( img (: ,: ,2) ) +64) ) ;
imgLogLuv (: ,: ,1) = ClampImg ( Le ,0 ,65535) ;
% CIE (u , v ) c h r o m a t i c i t y values
norm = ( imgXYZ (: ,: ,1) + imgXYZ (: ,: ,2) + imgXYZ (: ,: ,3) ) ;
x = imgXYZ (: ,: ,1) ./ norm ;
y = imgXYZ (: ,: ,2) ./ norm ;
% Encoding
norm_uv =
u_prime =
v_prime =

chromaticity
( -2* x +12* y +3) ;
4* x ./ norm_uv ;
9* y ./ norm_uv ;

Ue = floor (410* u_prime ) ;


imgLogLuv (: ,: ,2) = ClampImg ( Ue ,0 ,255) ;
Ve = floor (410* v_prime ) ;
imgLogLuv (: ,: ,3) = ClampImg ( Ve ,0 ,255) ;
end

Listing 2.4. Matlab Code: LogLuv encoding [111].

function imgRGB = L o g L u v 2 f l o a t ( img )


imgXYZ = zeros ( size ( img ) ) ;
% Decoding luminance Y
imgXYZ (: ,: ,2) = 2.^(( img (: ,: ,1) + 0.5) /256 - 64) ;
% Decoding c h r o m a t i c i t y
u_prime = ( img (: ,: ,2) +0.5) /410;
v_prime = ( img (: ,: ,3) +0.5) /410;
norm = 6* u_prime -16* v_prime + 12;
x = 9* u_prime ./ norm ;
y = 4* v_prime ./ norm ;
z = 1 - x - y;
norm = R e m o v e S p e c i a l s ( imgXYZ (: ,: ,2) ./ y ) ;
imgXYZ (: ,: ,1) = x .* norm ;
imgXYZ (: ,: ,3) = z .* norm ;

25

26

2. HDR Pipeline

% Conversio n from XYZ to RGB


imgRGB = C o n v e r t R G B X Y Z ( imgXYZ , 1) ;
end

Listing 2.5. Matlab Code: LogLuv decoding [111].

Another common HDR image format is the half-oating point format,


which is part of the specication of the OpenEXR le format [89]. In this
representation each color is encoded using a half precision oating point
number, which is a 16-bit implementation of the IEEE 754 standard [86],
and it is dened as

0
if M = 0 E = 0 ,

M
S E15

+ 1024

(1) 2

if E = 0,

M
S E15
H = (1) 2
1 + 1024
if 1 E 30,

(1)S
if E = 31 M = 0 ,

NaN
if E = 31 M > 0 ,
where S is the sign, occupying 1 bit, M is the mantissa, occupying 10 bits,
and E is the exponent, occupying 5 bits. Therefore, the nal format is
48 bpp, covering around 10.7 orders of magnitude. The main advantage,
despite the size, is that this format is implemented in graphics hardware allowing real-time applications to use HDR images. This format is considered
as the de facto standard in the movie industry [51].
Several medium dynamic range formats, which have the purpose of
covering classic lm range between 24 orders of magnitude, have been
proposed by the entertainment industry. However, they are not suitable
for HDR images/videos. The log encoding image format created by Pixar
is one such example [51].

2.3

Visualization of HDR Content

Following the HDR pipeline, two broad methods have been utilized for
displaying HDR content. The rst of these methods uses traditional LDR
displays augmented by software that compresses the luminance of the HDR
content in order to t the dynamic range of the LDR display. The second
method natively displays the HDR content directly using the facilities of
new HDR-enabled monitors.

2.3.1 Tone Mappers


Until recently, the only method to visualize HDR content was to adapt the
content by compressing the luminance values to suit the dynamic range of

2.3. Visualization of HDR Content

27

the display. This was made possible by the use of tone mapping operators
that convert real-world luminance to display luminance. Consequently a
large number of tone mapping operators have been developed that vary
in terms of output quality and computational cost. We will discuss tone
mapping and present a number of tone mappers in detail in Chapter 3.

2.3.2 Native Visualization of HDR Content


Display technologies that can natively visualize HDR images and videos
without using TMOs are now becoming available. The rst such device
was a viewer of HDR slides, termed the HDR viewer [190, 224]. The HDR
Monitor [190] was the rst monitor to visualize HDR content. For an
overview on these devices, see Table 2.2. The methods to display content
on these devices both divide an HDR image into a detail layer with colors
and a luminance layer that back-modulates the rst layer.
Device
HDR Viewer
HDR Monitor: Projector-based
HDR Monitor: LED-based 37

Lmax in cd/m2

Lmin cd/m2

5,000
2,700
3,000

0.5
0.054
0.015

Dynamic
Range
10,000:1
50,000:1
200,000:1

Table 2.2. The features of early HDR display devices.

The HDR viewer. Ward [224] and Ledda et al. [112] presented the rst native
viewer of HDR images (see Figure 2.7). Their device is inspired by the
classic stereoscope, a device used at the turn of the 19th to 20th century
for displaying three-dimensional images.

(a)

(b)

Figure 2.7. The HDR viewer by Ward [224] and Ledda et al. [112]. (a) A scheme
of the HDR viewer. (b) A photograph of the HDR viewer prototype. (The
photograph is courtesy of Greg Ward [224].)

28

2. HDR Pipeline

Figure 2.8. The processing pipeline to generate two images for the HDR viewer
by Ward [224] and Ledda et al. [112].

The HDR viewer is composed of three main parts: two lenses, two 50watt lamps, and two lm transparencies, one for each eye, that encode an
image taken/calculated at slightly dierent camera positions to simulate
the eect of depth. The two lenses are large-expanse extra-perspective
(LEEP) ARV-1 optics by Erik Howlett [87], which allow a 120-degree eld
of view. Moreover, an extra transparency image is needed for each eye that
increases the dynamic range through the light source modulation, because
a lm transparency can encode only 8-bit images due to limitations of the
medium. Note that, when light passes through a transparent surface it is
modulated, using a simple multiplication, by the level of transparency.
The processing method splits an HDR image into two; for the complete
pipeline, see Figure 2.8. The rst image, which is used to modulate the
light source, is created by applying a 32 32 Gaussian lter to the square
root of the image luminance. The second image, in front of the one for
modulation, is generated by dividing the HDR image by the modulation
one. To take into account the chromatic aberration of the optics, the red
channel is scaled by 1.5% more than the blue one, with the green channel
halfway in between. Note that while the image in front encodes colors and
details, the back one, used for modulation, encodes the global luminance
distribution.
The device and the processing technique allow images with a 10, 000 : 1
dynamic range to be displayed, where the measured maximum and minimum luminance are respectively 5,000 cd/m2 and 0.5 cd/m2 . Ledda et
al. [112] validated the device against reality and the histogram adjustment operator [110] (see Section 3.2.5) on a CRT monitor using a series of

2.3. Visualization of HDR Content

29

psychophysical experiments. The results showed that the HDR viewer is


closer to reality than a TMO reproduced on a CRT monitor.
The system was the rst solution for the native visualization of HDR
images. However, it is limited to display only static images, and the cost
of printing the four lm transparencies, for each scene to be viewed, is
around $200 US. For obvious reasons it can only by used by one person
at a time.
HDR monitors. Seetzen et al. [190] developed the rst HDR Monitors.
These were based on two technologies: a digital light processing (DLP)
projector, and light-emitting diodes (LEDs). As with the HDR viewer,
there is a modulated light source, which boosts the dynamic range of a
front layer that encodes details and colors. Both the DLP and LED HDR
monitors use LCDs for displaying the front layer.

(a)

(b)

Figure 2.9. The HDR monitor based on projector technology. (a) A scheme of the
monitor. (b) A photograph of the HDR Monitor. (The photograph is courtesy
of Matthew Trentacoste [190].)

30

2. HDR Pipeline

The DLP projector-driven HDR display was the rst of these technologies to be developed. This method uses a DLP projector to modulate the
light (see Figure 2.9). The processing method for creating images for the
projector is similar to the method for the HDR viewer described previously. However, there are a few dierences. Firstly, chromatic aberration
correction is removed because there are no optics. Secondly, the ltering
of the square root luminance is modeled on the point spread function of
the projector. Finally, the response functions are measured for both LCD
panel and projector, and their inverses are applied to the modulation image
and front image to linearize the signal.

(a)

(c)

(b)

(d)

Figure 2.10. The HDR monitor based on LCD and LED technologies. (a) The
scheme of a part of the monitor in a lateral section. (b) The scheme of a part of
the monitor in a frontal section. (c) A photograph of the HDR monitor. (d) The
rst commercial HDR display, the SIM2 Grand Cinema SOLAR 47. (Image
courtesy of SIM2.)

2.3. Visualization of HDR Content

31

The measured dynamic range of the 15.1 prototype monitor is 50,000:1


dynamic range, where the measured maximum luminance level is 2,700
cd/m2 and the minimum luminance level 0.054 cd/m2 . For this monitor,
the LCD panel is a 15.1 Sharp LQ150X1DG0 with a dynamic range of
300 : 1, and the projector an Optoma DLP EzPro737 with a contrast ratio
of 800 : 1. However, this technology is impractical under most conditions.
The required optical path for the projector is large, around one meter for
a 15.1 monitor, which is not practical for home entertainment and wider
displays. Moreover, the viewing angle is very small, because a Fresnel lens,
which is used for uniforming luminance values, has a huge fall o at wide
viewing angles. Finally, the projector needs to be very bright, and this
entails high power consumption and signicant heat generation.
The LED-based technology uses a low resolution LED panel to modulate the light (see Figure 2.10). The processing algorithm for the generation
of images for the LED and the LCD panels is similar to the one for the
DLP device. The main dierence is the addition of a step in which the
luminance for each LED is determined based on a down-sampled square
root luminance to the resolution of the LED panel and solving for the
values, taking the overlap of the point spread function of the LED into
account. The main LED model is the DR-37P, a 37 HDR Monitor with
a 200, 000 : 1 dynamic range where the measured maximum and minimum
luminance levels are respectively 3,000 cd/m2 and 0.015 cd/m2 . For this
monitor, 1,380 Seoul Semiconductor PN-W10290 LEDs were mounted behind a 37 Chi Mei Optoelectronics V370H1L01 LCD panel with a contrast
ratio of 250 : 1 [57].
This display technology did have some issues to solve before becoming
a fully edged consumer product. The rst was the quality. While the
dynamic range was increased, the image quality was reduced, having a
lower back-modulated resolution than the projector technology. The second
issue was that the LEDs, which are very expensive, consume a lot of power
(1,680 W) and require cooling. The heat dissipation is carried out using
fans and a liquid-based system, but this results in quite a lot of noise.
In February 2007, Dolby acquired the HDR display pioneers Brightside
for approximately $28 million. Since then the technology has been licensed
to the Italian company SIM2, which announced the rst commercial HDR
displaythe Solar 47in 2009 (see Figure 2.10(d)). The Solar 47 is able
to utilize full 16-bit processing and produce 65,536 shades per color. The
display has a resolution of 1920 1080 pixels, 2,206 LEDs in the back
plane, and a brightness of 2,000 cd/m2 . It is expected that many more
high dynamic range displays will also enter the market shortly.
Listing 2.6 shows code for visualizing HDR natively using the hardware
described previously. This function shows how to split the original HDR
into two layers: the luminance and detail layer represented by imgLum and

32

2. HDR Pipeline

function [ imgDet , imgLum ] = H D R M o n i t o r D r i v e r ( img )


% Is it a three colour channels image ?
check3Colo r ( img ) ;
% Normalization
maxImg = max ( max ( max ( img ) ) ) ;
if ( maxImg >0.0)
img = img / max ( max ( max ( img ) ) ) ;
end
% Luminance channel
L = sqrt ( lum ( img ) ) ;
% 32 x32 Gaussian Filter
L = G a u s s i a n F i l t e r W i n d o w (L ,32) ;
% Range reduction and q u a n t i z a t i o n at 8 - bit for the luminance
layer .
invGamma = 1.0/2.2;
imgLum = round (255*( L .^ invGamma ) ) /255;
% Range reduction and q u a n t i z a t i o n at 8 - bit for the detail layer
.
imgDet = zeros ( size ( img ) ) ;
tmpImgLum = imgLum .^2.2;
for i =1:3
imgDet (: ,: , i ) = round (255*( img (: ,: , i ) ./ tmpImgLum ) .^ invGamma
) /255;
end
imgDet = R e m o v e S p e c i a l s ( imgDet ) ;
end

Listing 2.6. Matlab Code: Visualizing HDR natively.

imgDet, respectively. In this case a gamma of 2.2 is used for the response
function for the luminance and detail layer. Ideally, the response function
of the display is measured to have more precise results.

3
Tone Mapping

Most of the display devices available nowadays are not able to natively
display HDR content. Entry level monitors/displays have a low contrast
ratio of only around 200 : 1. Although high-end LCD televisions have a
much high contrast ratio, on average around 10, 000 : 1, they are typically
discretized at 8-bit and rarely at 10-bit per color channel. This means
that colors shades are limited to 255, which is not HDR. In the last two
decades researchers have spent signicant time and eort in order to compress the range of HDR images and videos so the data may be visualized
more naturally on LDR displays.
Tone mapping is the operation that adapts the dynamic range of HDR
content to suit the lower dynamic range available on a given display. This
reduction of the range attempts to keep some characteristics of the original
content such as local and global contrast, details, etc. Furthermore, the
perception of the tone mapped image should match the perception of the
real-world scene (see Figure 3.1). Tone mapping is performed using an
operator f or tone mapping operator (TMO), which is dened in general as
f (I) : Rwhc
Dwhc
,
o
i

(3.1)

where I is the image, w and h are respectively the width and height of I,
c is the number of color bands of I (typically c = 3 since in most cases
processing is handled in RGB color space), Ri R, Do Ri . Do =
[0, 255] for normal LDR monitors. Furthermore, only luminance is usually
tone mapped by a TMO, while colors are unprocessed. This simplies

33

34

3. Tone Mapping

Figure 3.1. The relationship between tone mapped and the real-world scenes.
Observer 1 and Observer 2 are looking at the same scene but in two dierent
environments. Observer 1 is viewing the scene on a monitor after it has been
captured, stored, and tone mapped. Observer 2, on the other hand, is watching
the scene in the real world. The nal goal is that the tone mapped scene should
match the perception of the real-world scene and thus Observers 1 and 2 will
perceive the same scene.

Equation (3.1) to

Ld = fL (Lw ) : Rwh
[0, 255],


s
f (I) =
Rw
Rd

=
L
G
G
d
d Lw w ,

B
Bw
d

(3.2)

where s (0, 1] is a saturation factor that decreases saturation. This


is usually increased during tone mapping. After the application of f ,
gamma correction is usually applied and each color channel is clamped
in the range [0, 255].
Note that the original gamut is greatly modied in this process, and
the tone mapped color appearance can result in great dierences from that
in the original image. Research addressing this issue will be presented in
Section 3.6.
TMOs can be classied in dierent groups based on f or the image
processing techniques they use (see Table 3.1). The main groups of the
taxonomy are:
Global operators. The mapping is applied to all pixels with the same
operator f .
Local operators. The mapping of a pixel depends on its neighbors,
which are given as an input to f .

3. Tone Mapping

35

Global

Empirical
LM [189]
ELM [189]
QT [189]

Local

SVTR [35]
PTR [180]

Frequency

LICS [203]
BF [62]
GDC [67]

Segmentation

IM [123]
EF [143]

Perceptual
PBRT [204]
CBSF [222]
VAM [68]
HA [110]
TDVAT [168]
AL [60]
MS [167]
TMOHCI [17]
LMEAT [113]
TF [36]
iCAM06 [106]
RM [177]
[144]
SA [238]
LP [104]

Table 3.1. The taxonomy of TMOs, which are divided based on their image
processing techniques and their f . Superscript T means that the operator is
temporal and suitable for HDR video content. See Table 3.2 for a clarication of
the key.
Key
AL
BF
CSBF
EF
ELM
GDC
HA
iCAM
IM
LCSI
LM
LMEA
LP
MS
PBR
PTR
QT
RM
SA
SVTR
TDVA
TF
TMOHCI
VAM

Name
Adaptive Logarithmic
Bilateral Filtering
Contrast Based Scale Factor
Exposure Fusion
Exponential Logarithmic Mapping
Gradient Domain Compression
Histogram Adjustment
Image Color Appearance Model
Interactive Manipulation
Low Curvature Image Simpliers
Linear Mapping
Local Model of Eye Adaptation
Lightness Perception
Multi-Scale
Perceptual Brightness Reproduction
Photographic Tone Reproduction
Quantization Technique
Retinex Methods
Segmentation Approach
Spatially Variant Tone Reproduction
Time Dependent Visual Adaptation
Trilateral Filtering
Tone Mapping Operator for High Contrast Images
Visual Adaptation Model

Table 3.2. Key to TMOs for Table 3.1.

36

3. Tone Mapping

Segmentation operators. The image is segmented in broad regions,


and a dierent mapping is applied to each region.
Frequency/Gradient operators. Low and high frequencies of the images are separated. While an operator is applied to the low frequencies, high frequencies are usually kept as they preserve ne details.
Further classications can be given based on the design philosophy of
the TMO, or its use:
Perceptual operators. These operators can be Global, Local, Segmentation, or Frequency/Gradient. The main focus is that the function
f models some aspects of the HVS.
Empirical operators. These operators can be Global, Local, Segmentation, or Frequency/Gradient. In this case, f does not try to mimic
the HVS, but it tries to create aesthetically-pleasing images inspired
by other elds, such as photography.
Temporal operators. These operators are designed to be also suitable
for HDR video content and animations.
In the next section, the Matlab framework is presented and some
basics common functions described. Afterwards, the main TMOs are reviewed. This review is organized by image processing techniques described
by the taxonomy in Table 3.1. Global operators are discussed in Section 3.2, local operators in Section 3.3, frequency operators in Section 3.4,
and segmentation operators in Section 3.5.

3.1

TMO MATLAB Framework

Often TMOs, independently to which category they belong, have two common steps. In this section we describe the common routines that are used
by most, but not all, TMOs. The rst step is the extraction of the luminance information from the input HDR image or frame. This is because
a TMO is typically working on the luminance channel and avoiding color
compression. The second step is the restoration of color information in the
compressed image. The implementation of these steps is shown in Listing 3.1 and Listing 3.2.
In the rst step of Listing 3.1 the input image, img, is checked to see
if it is composed of three color channels. Then the luminance channel is
extracted using the function lum.m, under the folder ColorSpace. Note
that for each TMO that will be presented in this chapter, extra input
parameters for determining the appearance of the output image, imgOut,

3.1. TMO MATLAB Framework

37

% Is it a three color channels image ?


check3Co lo r ( img ) ;
% Luminance channel
L = lum ( img ) ;

Listing 3.1. Matlab Code: TMO rst step.

are veried if they are set. If not, they are set equal to default values
suggested by their authors.
In the last step of Listing 3.2 imgOut is allocated using the zeros Matlab function, initializing values to zero. Subsequently, each color component of the input image, img, is multiplied by the luminance ratio between
the compressed luminance, Ld, and the original luminance, L, of img. Finally, the function RemoveSpecials.m, under the folder Util, is used to
remove possible Inf or NaN values introduced by the previous steps of the
TMO. This is due to the fact that a division by zero can happen when the
luminance value of a pixel is zero.
An optional step is color correction. Many TMOs handle this by applying Equation (3.2) to the nal output. However, we have left this extra
process out because color appearance can substantially vary depending on
the TMOs parameters. This function, ColorCorrection.m, under the
folder ColorSpace, is shown in Listing 3.3, and applies Equation (3.2) to
the input image in a straightforward way. Note that the correction value,
correction, can be a single channel image per pixel correction.
% Removing the old luminance
imgOut = zeros ( size ( img ) ) ;
for i =1:3
imgOut (: ,: , i ) = img (: ,: , i ) .* Ld ./ L ;
end
imgOut = R e m o v e S p e c i a l s ( imgOut ) ;

Listing 3.2. Matlab Code: TMO last step.

function imgOut = C o l o r C o r r e c t i o n ( img , correction )


c h e c k 3 C o l o r 3 ( img ) ;
L = lum ( img ) ;
imgOut = zeros ( size ( img ) ) ;
for i =1:3
imgOut = (( img (: ,: , i ) ./ L ) .^ correction ) .* L ;
end

38

3. Tone Mapping

imgOut = R e m o v e S p e c i a l s ( imgOut ) ;
end

Listing 3.3. Matlab Code: The color correction step.

All implemented TMOs in this book produce linearly tone mapped values in [0, 1]. In order to display tone mapped images properly on a display,
the inverse characteristic of the monitor needs to be applied. A straightforward way to do this for standard LCD and CRT monitors is to apply
an inverse gamma function, typically with = 2.2 (in case of sRGB color
space, = 2.4).

3.2

Global Operators

With global operators, the same operator f is applied to all pixels of the
input image, preserving global contrast. The operator may sometimes perform a rst pass of the image to calculate image statistics, which are subsequently used to optimize the dynamic range reduction. Some common
statistics that are typically calculated for tone mapping are maximum luminance, minimum luminance, and logarithmic or arithmetic average values (see Section 1.1.3). To increase robustness and to avoid outliers, these
statistics are calculated using percentiles, especially for minimum and maximum values, because they could have been aected by noise during image
capture. It is relatively straightforward to extend global operators into
the temporal domain. In most cases it is sucient to temporally lter the
computed image statistics, thus avoiding possible ickering artifacts due
to the temporal discontinuities of the frames in the sequence. The main
drawback of global operators is that, since they make use of global image
statistics, they are unable to maintain local contrast and the ner details
of the original HDR image.

3.2.1 Simple Mapping Methods


Simple operators are based on basic functions, such as linear scaling, logarithmic functions, and exponential functions. While they are usually fast
and simple to implement, they cannot fully compress the dynamic range
accurately.
Linear exposure is a straightforward way to visualize HDR images. The
starting image is multiplied by a factor e that is similar in concept to the
exposure used in a digital camera:
Ld (x) = eLw (x).

3.2. Global Operators

(a)

39

(b)

(c)

(d)

Figure 3.2. An example of the applications of simple operators to the cathedral


HDR image. (a) Normalization. (b) Automatic exposure. (c) Logarithmic mapping q = 0.01 and k = 1. (d) Exponential mapping q = 0.1 and k = 1. (The
original image is courtesy of Max Lyons [129].)

The user chooses e based on information that is considered interesting


to visualize. When e = Lw,1max , this scaling is called normalization and it
can cause a very dark appearance (see Figure 3.2(a)). If e is calculated by
maximizing the number of well-exposed pixels, the scaling is called automatic exposure [154] (see Figure 3.2(b)). Ward [222] proposed to calculate
e by matching the threshold visibility in an image and a display using
threshold-versus-intensity (TVI) functions. However, a simple linear scale
cannot compress the dynamic range of the scene, hence it shows only a
slice of information.
Logarithmic mapping applies a logarithm function to HDR values. The
base is the maximum value of the HDR image to map nonlinear values in
the range [0, 1]. The operator is dened as

log10 1 + qLw (x)


,
Ld (x) =
(3.3)
log10 1 + kLw, max
where q [1, ) and k [1, ) are constants selected by the user for
determining the desired appearance of the image.
if (~ exist ( q_logarithmic ) ||~ exist ( k_logarithmic ) )
q _ l o g a r i t h m i c =1;
k _ l o g a r i t h m i c =1;
end
% check for q_logarithmic >=1
if ( q_logarithmic <1)
q _ l o g a r i t h m i c =1;
end

40

3. Tone Mapping

% check for q_logarithmic >=1


if ( k_logarithmic <1)
k _ l o g a r i t h m i c =1;
end
% Maximum luminance value
LMax = max ( max ( L ) ) ;
% Dynamic Range Reduction
Ld = log10 (1+ L * q _ l o g a r i t h m i c ) / log10 (1+ LMax * k _ l o g a r i t h m i c ) ;

Listing 3.4. Matlab Code: Logarithmic TMO.

Listing 3.4 provides the Matlab code of the logarithm mapping operator. The full code can be found in the le LogarithmicTMO.m. The
variables q logarithmic and k logarithmic are respectively equivalent
to the parameter q and k in Equation (3.3). Figure 3.2(c) shows an example of an image tone mapped using the logarithm mapping operator for
q = 0.01 and k = 1.
Exponential mapping applies an exponential function to HDR values.
It remaps values in the interval [0, 1], where each value is divided by the
arithmetic average. The operator is dened as


qLw (x)
Ld (x) = 1 exp
,
(3.4)
kLw, H
where q [1, ) and k [1, ) are constants selected by the user.
if (~ exist ( q_exponential ) ||~ exist ( k_exponential ) )
q _ e x p o n e n t i a l =1;
k _ e x p o n e n t i a l =1;
end
% check for q_logarithmic >=1
if ( q_logarithmic <1)
q _ l o g a r i t h m i c =1;
end
% check for q_logarithmic >=1
if ( k_logarithmic <1)
k _ l o g a r i t h m i c =1;
end
% Logarith m ic mean calculat i on
Lwa = logMean ( img ) ;
% Dynamic Range Reduction
Ld =1 - exp ( -( L * q _ e x p o n e n t i a l ) /( Lwa * k _ e x p o n e n t i a l ) ) ;

Listing 3.5. Matlab Code: Exponential TMO.

3.2. Global Operators

41

Listing 3.5 provides the Matlab code of the exponential mapping technique. The full code can be found in the le ExponetialTMO.m. The variables q exponential and k exponential are respectively equivalent to the
parameter q and k in Equation (3.4). Figure 3.2(d) shows the use of the
exponential mapping operator for q = 0.1 and k = 1.
Both exponential and logarithmic mapping can deal with medium dynamic range content reasonably well. However, these operators struggle
when attempting to compress full HDR content. This can result in a very
dark or bright appearance of the image, low preservation of global contrast,
and an unnatural look.

3.2.2 Brightness Reproduction


One of the rst TMOs in the eld of computer graphics was proposed by
Tumblin and Rushmeier [202]. This was revised by Tumblin et al. [204].
This operator is inspired by the HVS and adopts Stevens and Stevens work
on brightness [193, 194]. The TMO is dened as


(Lw, H )
Lw (x)
Ld (x) = mLda
,
(3.5)
,
=
Lw, H
(Lda )
where Lda is the adaptation luminance of the display (30100 cd/m2 for
LDR displays). The function (x) is the Stevens and Stevens contrast
sensitivity function for a human adapted to a luminance value x, which is
dened as


2
1.855 + 0.4 log10 x + 2.3 105
for x 100 cd/m ,
(x) =
(3.6)
2.655
otherwise.
Finally, m is the adaptation-dependent scaling term, which prevents anomalous gray night images. It is dened as
wd 1

m = Cmax2

wd =

(Lw, H )

,
1.855 + 0.4 log10 Lda

(3.7)

where Cmax is the maximum contrast of the display.


This operator compresses HDR images, preserving brightness and producing plausible results when calibrated luminance values are available.
Note that this TMO needs gamma correction to avoid dark images even if it
already includes a power function (see the parameter in Equation (3.5)).
Figure 3.3 shows some results.
Listing 3.6 provides the Matlab code of Tumblin and Rushmeiers operator [202]. The full code can be found in the le TumblinRushmeierTMO.m.

42

3. Tone Mapping

(a)

(b)

Figure 3.3. An example of Tumblin and Rushmeiers operator [202] applied to the
Bottles HDR image. (a) A tone mapped image with display adaptation luminance
of 30 cd/m2 . (b) A tone mapped image with display adaptation luminance of
100 cd/m2 .

The method takes the following parameters of the display as input: luminance adaptation (Lda) and maximum contrast (CMax). If the input parameters are not given by the user, the default values of 80 cd/m2 for the
luminance adaptation and 100 for the maximum contrast of the display are
assigned. The luminance adaptation of the input image is computed as the
logarithmic mean using the function logMean.m. It can be found in the
Tmo/util folder (Listing 3.7).
The delta value is used to avoid singularities (log(0)), which occur
when values of the input luminance is equal to 0. The next step is to
compute the Stevens and Stevens contrast sensitivity function (x) for
% default parameter s
if (~ exist ( Lda ) |~ exist ( CMax ) )
Lda =80;
CMax =100;
end
% Logarith m ic mean calculta i on
if (~ exist ( Lwa ) )
Lwa = exp ( mean ( mean (( log ( L +2.3*1 e -5) ) ) ) ) ;
end
% Range reduction
gamma_w = g a m m a T u m R u s h T M O ( Lwa ) ;
gamma_d = g a m m a T u m R u s h T M O ( Lda ) ;
gamma_wd = gamma_w . / ( 1 . 8 5 5 + 0 . 4 * log ( Lda ) ) ;
mLwa =( sqrt ( CMax ) ) .^( gamma_wd -1) ;
Ld = Lda * mLwa .*( L ./ Lwa ) .^( gamma_w ./ gamma_d ) ;

Listing 3.6. Matlab Code: Tumblin and Rushmeiers TMO [202].

3.2. Global Operators

43

function Lav = logMean ( img )


delta =1 e -6;
Lav = exp ( mean ( mean ( log ( img + delta ) ) ) ) ;
end

Listing 3.7. Matlab Code: Logarithmic mean computation function.

the luminance adaptation of the two observers (real-world input image and
displayed image). This is calculated using the function gammaTumRushTMO.m,
which implements Equation (3.6). It can be found in the Tmo/util folder
(Listing 3.8).
Afterwards, the adaptation-dependent scaling term m is calculated as in
Equation (3.7), which corresponds to mLwa in the code. Finally, the parameter and the compressed luminance Ld are computed (Equation (3.5)).
The nal step is the normalization of the luminance computed in the previous step (Listing 3.9).
function val = g a m m a T u m R u s h T M O ( x )
val = zeros ( size ( x ) ) ;
indx = find (x <=100) ;
if ( max ( size ( indx ) ) >0)
val ( indx ) =1.855+0. 4* log10 ( x ( indx ) +2.3*1 e -5) ;
end
indx = find (x >100) ;
if ( max ( size ( indx ) ) >0)
val ( indx ) =2.655;
end
end

Listing 3.8. Matlab Code: Stevens and Stevens contrast sensitivity function
(gammaTumRushTMO.m).

% Normalization
imgOut = imgOut /100;

Listing 3.9. Matlab Code: Normalization step for Tumblin and Rushmeiers
TMO [202].

3.2.3 Quantization Techniques


An operator based on rational functions was proposed by Schlick [189].
This provides a straightforward and intuitive approach to tone mapping.
The TMO is dened as
Ld (x) =

pLw (x)
,
(p 1)Lw (x) + Lw, max

(3.8)

44

3. Tone Mapping

where p [1, ), and can be automatically estimated as


p=

L0 Lw, max
.
2N Lw, min

(3.9)

The variable N is the number of bits of the output display, and L0 is the
lowest luminance value of a monitor that can be perceived by the HVS.
The use of p in Equation (3.9) is a uniform quantization process since the
same function is applied to all pixels. A nonuniform quantization process
can be adopted using a spatially varying p determining, for each pixel of
the image, a local adaptation:


Lw, avg (x)

,
(3.10)
p = p 1 k + k"
Lw, max Lw, min
where k [0, 1] is a weight of nonuniformity that is chosen by the user, and
Lw, avg (x) is the average intensity of a given zone surrounding the pixel.
The behavior of this nonuniform process is commonly associated with a
local operator. The authors suggested a value of k equal to 0.5, which is
used in all their experiments. They also proposed three dierent techniques
to compute the average intensity value Lw, avg (x) (for more details refer
to [189]). This nonuniform process is justied by the fact that the human
eye moves continuously from one point to another in an image. For each
point on which the eye focuses there exists a surrounding zone that creates

(a)

(b)

(c)

(d)

Figure 3.4. An example of quantization techniques applied to the Stanford Memorial Church HDR image. (a) Uniform technique using automatic estimation for
p; see Equation (3.9). (b) Nonuniform technique with k = 0.33. (c) Nonuniform
technique with k = 0.66. (d) Nonuniform technique with k = 0.99. (The original
HDR image is courtesy of Paul Debevec [50].)

3.2. Global Operators

45

Compressed Luminance in [0,1]

0.9
0.8
0.7
0.6
0.5
0.4
0.3

uniform manual
uniform automatic
non uniform k = 0.33
non uniform k = 0.66
non uniform k = 0.99

0.2
0.1
0 3
10

10

10

10

10

10

10

Input Luminance in cd/m2

Figure 3.5. Log-plot of the quantization techniques applied to the Stanford


Memorial Church HDR image.

a local adaptation and modies the luminance perception. Furthermore,


Schlick tried to test Chius local method [35] in his model for including more
than one pixel in the local adaptation. However, this local method was not
included in the end because too many artifacts were generated in the nal
tone mapped pictures [189]. The quantization techniques provide a simple
and computationally fast TMO. However, user interaction is needed to
specify the appropriate k value for each image (see Figure 3.4). Figure 3.5
shows the logarithmic plot of the uniform and nonuniform quantization
technique varying the parameter k.

if (~ exist ( schlick_mode ) |~ exist ( schlick_p ) |~ exist (


schlick_bit ) |~ exist ( schlick_dL0 ) |~ exist ( schlick_k ) )
s c h l i c k _ m o d e = standard ;
schlick_p =1/0.005;
end
% Max Luminance value
LMax = max ( max ( L ) ) ;
% Min Luminance value
LMin = min ( min ( L ) ) ;
if ( LMin <=0.0)
ind = find ( LMin >0.0) ;
LMin = min ( min ( L ( ind ) ) ) ;
end

46

3. Tone Mapping

% Mode selection
switch s c h l i c k _ m o d e
case standard
p = schlick_p ;
if (p <1)
p =1;
end
case calib
p = schlick_d L 0 * LMax /(2^ schlick_b it * LMin ) ;
case nonuniform
p = schlick_d L 0 * LMax /(2^ schlick_b it * LMin ) ;
p = p *(1 - schlick_k + schlick_k * L / sqrt ( LMax * LMin ) ) ;
end
% Dynamic Range Reduction
Ld = p .* L ./(( p -1) .* L + LMax ) ;

Listing 3.10. Matlab Code: Schlick TMO [189].

Listing 3.10 provides the Matlab code of the Schlick TMO [189].
The full code may be found in the le SchlickTMO.m. The parameter
schlick mode species the type of model of the Schlick technique used.
There are three cases: standard, calib, and nonuniform modes. The
standard mode takes the parameter p as input from the user. The calib
and nonuniform modes use the uniform and nonuniform quantization technique, respectively. The variable schlick p is the parameter p or p depending on the mode used, schlick bit is the number of bits N of the
output display, schlick dL0 is the parameter L0 , and schlick k is the parameter k. The rst step is to extract the luminance channel from the image
and the maximum, L Max, and the minimum luminance, L Min. These values can be used for calculating p. Afterwards, based on the selection mode,
one of the three modalities is chosen and either the parameter p is given
by the user (standard mode) or is computed using Equation (3.9) or using
Equation (3.10). Finally, the dynamic range of the luminance channel is
reduced by applying Equation (3.8).

3.2.4 A Model of Visual Adaptation


Ferwerda et al. [68] presented a TMO based on psychophysical experiments,
which was subsequently extended by Durand and Dorsey [61]. This operator models aspects of the HVS such as changes in threshold visibility,
color appearance, visual acuity, and sensitivity over time, which depend
on adaptation mechanisms of the HVS (see Figure 3.6). This is achieved
by using TVI functions for modeling rods (Tp , photopic vision) and cones

3.2. Global Operators

47

(a)

(b)

(d)

(c)

(e)

Figure 3.6. An example of the operator proposed by Ferwerda et al. [68], varying
mean luminance of the HDR image. (a) 0.01 cd/m2 . (b) 0.1 cd/m2 . (c) 1 cd/m2 .
(d) 10 cd/m2 . (e) 100 cd/m2 . Note that colors vanish when decreasing the mean
luminance due to the nature of scotopic vision. (The original HDR image is
courtesy of Ahmet O
guz Aky
uz.)

(Ts , scotopic vision). These functions are dened as

if log10 x 2.6,
0.72
log10 T p(x) = log10 x 1.255
if log10 x 1.9,

2.7
(0.249 log10 x + 0.65) 0.72 otherwise,

48

3. Tone Mapping

and

if log10 x 3.94,
2.86
log10 T s(x) = log10 x 0.395
if log10 x 1.44,

2.18
(0.405 log10 x + 1.6)
2.86 otherwise.

The operator is a simple linear scale of each color channel for simulating
photopic conditions, added to an achromatic term for simulating scotopic
conditions. The operator is given by

Rw (x)
Lw (x)
Rd (x)
Gd (x) = mc (Lda , Lwa ) Gw (x) + mr (Lda , Lwa ) Lw (x) , (3.11)
Bd (x)
Bw (x)
Lw (x)
where Lda is the luminance adaptation of the display, Lwa is the luminance
adaptation of the image, and mr and mc are two scaling factors that depend
on the TVI functions. They are dened as
mr (Lda , Lwa ) =

Tp (Lda )
,
Tp (Lwa )

mc (Lda , Lwa ) =

Ts (Lda )
.
Ts (Lwa )

The authors suggested that Lwa = Lw, max and Lda = 0.5Ld, max , where
Ld, max is the maximum luminance level of the display.
Durand and Dorsey [61] extended the operator to work in the mesopic
vision based on work of Walraven and Valeton [213]. Moreover, a timedependent mechanism was introduced using data from Adelson [5] and
Hayhoe [83].
The TMO proposes the simulation of many aspects of the HVS, but the
reduction of the dynamic range is achieved through a simple linear scale
that cannot compress the dynamic range much.
if (~ exist ( LdMax ) |~ exist ( Lda ) )
LdMax =100;
Lda =30;
end
if ( Lda <0)
Lda = LdMax /2;
end
% Logarith m ic mean calculat i on
Lwa = logMean ( img ) ;
% Contrast reduction
mR = TpFerwerda ( Lda ) / TpFerwerda ( Lwa ) ;
mC = TsFerwerda ( Lda ) / TsFerwerda ( Lwa ) ;
k = ClampImg ((1 -( Lwa /2 -0.01) /(10 -0.01) ) ^2 ,0 ,1) ;
% Removing the old luminance

3.2. Global Operators

49

imgOut = zeros ( size ( img ) ) ;


vec =[1.05 ,0.97 ,1.27];
for i =1:3
imgOut (: ,: , i ) =( mC * img (: ,: , i ) + vec ( i ) * mR * k * L ) ;
end
imgOut = imgOut / LdMax ;

Listing 3.11. Matlab Code: Ferwerda et al. TMO [68].

Listing 3.11 provides the Matlab code of the Ferwerda et al. operator [68]. The full code may be found in the le FerwerdaTMO.m. The
method takes the following parameters of the display as input: the maximum LdMax and the adapted Lda (Lda ) display luminance. The rst step
is to calculate the luminance adaptation of the input image. Then, the
scaling factors for the scotopic and photopic vision are calculated using the
function TsFerwerda.m and TpFerwerda.m, respectively. These functions
can be found in the Tmo/util folder. Listing 3.12 and Listing 3.13 provide the Matlab code for the TVI functions for the photopic and scotopic
ranges, respectively, where the input is the luminance adaptation. Note
that the TVI functions work in log10 space; therefore, a linear output is
achieved applying an exponentiation in base 10.
The parameter k, responsible for the simulation of the transition from
mesopic to photopic range, is subsequently computed and clamped between
the range 0 and 1. Finally, Equation (3.11) is applied. Note that chromaticity for the three RGB colors channels are stored in vec. A normalization
step of the output values of the TMO is required to have the nal output
values between 0 and 1. This is done by dividing the output values with
the maximum luminance of the display device, LdMax.
function val = TsFerwerda ( x )
x2 = log10 ( x ) ;
if ( x2 <= -3.94)
val = -2.86;
else
if ( x2 >= -1.44)
val = x2 -0.395;
else
val =(0.405* x2 +1.6) ^2.18 -2.86;
end
end
val =10^ val ;
end

Listing 3.12. Matlab Code: TVI function for the photopic range [68].

50

3. Tone Mapping

function val = TsFerwerda ( x )


x2 = log10 ( x ) ;
if ( x2 <= -2.6)
val = -0.72;
else
if ( x2 >=1.9)
val = x2 -1.255;
else
val =(0.249* x2 +0.65) ^2.7 -0.72;
end
end
val =10^ val ;
end

Listing 3.13. Matlab Code: TVI function for the scotopic range [68].

3.2.5 Histogram Adjustment


The classic imaging technique of histogram equalization [74] was modied
and applied to tone mapping by Larson et al. [110], who also included simulations of aspects of HVS, such as glare, loss of acuity, and color sensitivity.
Firstly, the operator calculates the histogram of the image, I, in log2
space, using a number of bins, nbin . Larson et al. state that 100 bins are
adequate for accurate results. At this point, the cumulative histogram P
is computed as
n
x
bin


I(i)
,
T =
I(i),
(3.12)
P (x) =
T
i=1
i=1
where x is a bin. Note that the cumulative histogram is an integration,
while the histogram is its derivative with an appropriate scale:
I(x)
P (x)
=
,
x
T x

x =

log(Lw, max /Lw, min )


.
nbin

(3.13)

The histogram is subsequently equalized. A classic equalization contrast


such as
log(Ld (x)) = log(Ld, min ) + P (log Lw (x)) log(Ld, max /Ld, min )

(3.14)

exaggerates contrast in large areas of the images, due to compression of


the range in areas with few samples, and expansion in very populated
ones; see Figure 3.7 for an example of the operator. They applied this

3.2. Global Operators

51

45
40
35

Frequency

30
25
20
15
10
5
0
3.5

2.5

1.5

0.5

0.5

1.5

Order of magnitude in log

10

(a)

(b)

Figure 3.7. An example of the Histogram adjustment by Larson et al. [110] to the
IDL HDR image. (a) The histogram of the HDR image. (b) The tone mapped
image.

straightforward approach, which can be expressed as


Ld
Ld

.
Lw
Lw

(3.15)

The dierentiation of Equation (3.14), using Equation (3.13), and applying Equation (3.15) leads to
elog(Ld )

Ld
f (log(Lw )) log(Ld, max /Ld, min )

,
T x
Lw
Lw

which is reduced to a condition on f (x):


f (x) c,

where c =

T x
.
log(Ld, max /Ld, min )

(3.16)

This means that exaggeration may occur when Equation (3.16) is not
satised. A solution is to truncate f (x), which has to be done iteratively to
avoid changes in T and subsequently changes in c. Note that the histogram
in Figure 3.7(a) is truncated in [2.5, 0.5]. The operator introduces some
mechanism to mimic the HVS, such as limitation of contrast, acuity, and
color sensitivity. These are in part inspired by Ferwerda et al.s [68] work.
In summary, the operator presents a modied histogram equalization
for HDR images that achieves good range compression and overall contrast,
simulating some aspects of the HVS.
if (~ exist ( nBin ) )
nBin =256;
end

52

3. Tone Mapping

if ( nBin <1)
nBin =256;
end
% The image is downsampl e d
[n , m ]= size ( L ) ;
maxCoord = max ([ n , m ]) ;
v i e w A n g l e W i d t h = 2* atan ( m /(2* maxCoord *0.75) ) ;
v i e w A n g l e H e i g h t = 2* atan ( n /(2* maxCoord *0.75) ) ;
fScaleX = (2* tan ( v i e w A n g l e W i d t h /2) /0.01745) ;
fScaleY = (2* tan ( v i e w A n g l e H e i g h t /2) /0.01745) ;
L2 = imresize ( L ,[ round ( fScaleY ) , round ( fScaleX ) ] , bilinear ) ;
LMax = max ( max ( L2 ) ) ;
LMin = min ( min ( L2 ) ) ;
if ( LMin <=0.0)
LMin = min ( L2 ( find ( L2 >0.0) ) ) ;
end
% Log space
Llog = log ( L2 ) ;
LlMax = log ( LMax ) ;
LlMin = log ( LMin ) ;
% Display c h a r a c t e r i s t i c s in cd / m ^2
LdMax =100;
LldMax = log ( LdMax ) ;
LdMin =1;
LldMin = log ( LdMin ) ;
% function P
p = zeros ( nBin ,1) ;
delta =( LlMax - LlMin ) / nBin ;
for i =1: nBin
indx = find ( Llog >( delta *( i -1) + LlMin ) & Llog <=( delta * i + LlMin ) ) ;
p ( i ) = numel ( indx ) ;
end
% Histogram ceiling
p = h i s t o g r a m _ c e i l i n g (p , delta /( LldMax - LldMin ) ) ;
% Calculat i on of P ( x )
Pcum = cumsum ( p ) ;
Pcum = Pcum / max ( Pcum ) ;
% Calculate tone mapped luminance
x =( LlMin :( LlMax - LlMin ) /( nBin -1) : LlMax ) ;
pps = spline (x , Pcum ) ;
Ld = exp ( LldMin +( LldMax - LldMin ) * ppval ( pps , log ( L ) ) ) ;
Ld =( Ld - LdMin ) /( LdMax - LdMin ) ;

Listing 3.14. Matlab Code: Larson et al. TMO [110].

Listing 3.14 provides the Matlab code of the Larson et al. TMO [110].
The full code may be found in the le WardHistAdjTMO.m. The method

3.2. Global Operators

53

takes the number of bins, nBin, as input and if it is below 1, a value


of 256 bins is assigned. The rst step is to scale the HDR input image
down to roughly correspond to 1 square pixels. This simulates the eye
adaptation for the best view in the fovea [110]. The scale factor fScale
is rst computed and then, with the Matlab function, the downscaled
image imresize.m is stored in L2. Then, the statistics Lw, max (LMax)
and Lw, min (LMin) for the downscaled image are extracted. At this point,
L2, LMin, and LMax are converted to logarithmic scale to best capture the
luminance population and subjective response over a wide dynamic range.
Afterwards, the characteristics of the display are dened and converted
to logarithmic scale. The dynamic range of the display is xed between
0.01 and 1. The next step is the computation of the cumulative frequency
distribution P using Equation (3.12). N is the number of pixels in L2,
and delta is the size of each bin. The histogram is built by nding the
elements of L2 that belong to each bin. After histogram construction,
Equation (3.16) is applied to perform the histogram ceiling. The Matlab
code for the histogram ceiling is depicted in Listing 3.15. This follows
the pseudo-code from the original paper [110] and can be found in the le
histogram ceiling.m in the Tmo/util folder.

function H = h i s t o g r a m _ c e i l i n g (H , k )
tolerance = sum ( H ) *0.025;
trimmings =0;
val =1;
n = length ( H ) ;
while (( trimmings <= tolerance ) & val )
trimmings =0;
T = sum ( H ) ;
if (T < tolerance )
val =0;
else
ceiling = T * k ;
for i =1: n
if ( H ( i ) > ceiling )
trimmings = trimmings + H ( i ) - ceiling ;
H ( i ) = ceiling ;
end
end
end
end
end

Listing 3.15. Matlab Code: Histogram ceiling.

54

3. Tone Mapping

At this point, it is possible to compute the cumulative function P (Equation (3.12)). The variable T maintains the number of samples, and the
computation of Pcum corresponds to Equation (3.12). Finally, P is used to
tone map the dynamic range of the luminance L.

3.2.6 Time-Dependent Visual Adaptation


While the HVS adapts naturally to large changes in luminance intensity in a
scene, this adaptation process is not immediate but takes time proportional
to the amount of variation in the luminance levels. Pattanaik et al. [168]
proposed a time-dependent TMO to take into account this important eect
of the HVS. Furthermore, the method is more general than Ferwerda et al.s
one [68] based on results in psychophysics, physiology, and color science.
The complete pipeline of the model is depicted in Figure 3.8. The
rst step of the TMO is to convert the RGB input image into luminance
values for rods (Lrod ) and cones (Lcone ), which are the CIE standard Y 
and Y , respectively. Then, the retinal response (RS) of rods and cones is
calculated as in Hunts model [88]:
Lcone (x)n
,
n
Lcone (x)n + cone
(3.17)
where n = 0.73 [88]. The parameter is the half-saturation parameter
and is computed for rods and cones as is shown in Equation (3.18) to
Equation (3.19):
RSrod (x) = Brod

rod =

Lrod (x)n
n ,
Lrod (x)n + rod

RScone (x) = Bcone

2.5874 Grod
19000

cone =

j2

1/6

Grod + 0.2615 (1 j 2 )4 Grod


12.9223 Gcone

k4

1/3

Gcone + 0.171 (1 j 4 )4 Gcone

where
j=

and
k=

105

(3.18)

1
,
Grod + 1

1
.
5 Gcone + 1

The value B is the bleaching parameter dened as


Bcone =

2 106
,
2 106 + Gcone

Brod =

0.004
.
0.004 + Grod

(3.19)

Gcone and Grod are parameters at adaptation time for a particular luminance value. To have a dynamic model, Gcone and Grod need to be time

3.2. Global Operators

55

Figure 3.8. The pipeline of the adaptation operator by Pattanaik et al. [168].

dependent: Gcone (t) and Grod (t). Firstly, the steady state Gcone and Grod
are computed as one-fth of the paper-white reectance patch in the Macbeth checker as suggested by Hunt [88], but other methods are possible
such as the one-degree weighting method used in Larson et al. [110]. As
pointed out by the authors, which method to use depends mostly on the
application. Secondly, the time dependency is modeled using two expot
nential lters with output feedback, 1 e t0 , where t0,rod = 150 ms and
t0,cone = 80 ms. Note that colors are simply modied to take into account
range compression in Equation (3.17), as

S

Rw (x)
R (x)
1
Gw (x) ,
G (x) =
L
(x) B (x)

cone
B (x)
w

S(x) =

n
nBcone Lcone (x)n cone
.
n
n
(Lcone (x) + cone )2

Afterwards, a color appearance model (CAM) is applied to compute


the scene appearance values Qlum (luminance appearance), Qcolor (color
appearance), Qspan (width appearance), and Qmid (midrange appearance)
from the RS computed for the rods and cones photoreceptor (adaptation
model). The nal step is to inverse the appearance model and the adaptation model to match the viewing condition in an ordinary oce lighting,
where an LDR display is used to visualize the output image.
The operator is a simple, global, time-dependent, and fully automatic
TMO that lacks some secondary eects of the HVS (glare, acuity, etc.); see
Figure 3.9 for an example. This model was extended by Ledda et al. [113],

56

3. Tone Mapping

(a)

(b)

(c)

(d)

Figure 3.9. An example of the method by Pattanaik et al. [168], where the
initial viewer adaptation is 0.05 cd/m2 and the luminance mean is 30 cd/m2 .
(a) Adaptation after 0 seconds. (b) Adaptation after 5 seconds. (c) Adaptation
after 10 seconds. (d) Full adaptation after 110 seconds. (The original HDR image
is courtesy of Paul Debevec.)

who proposed a local model and fully automatic visual adaptation for static
images and videos. Additionally, they presented a psychophysical validation study of the TMO using an HDR Monitor [190]. Their results showed
a strong correlation between tone mapped images displayed on an LDR
monitor and linear HDR images shown on an HDR monitor. A further
extension of Pattanaik et al.s [168] operator was proposed by Irawan et
al. [91] who combined the method with the histogram adjustment operator
by Larson et al. [110]. Their work simulates visibility in time-varying, high
dynamic range scenes for observers with impaired vision. This is achieved
using a temporally coherent histogram adjustment method combined with
an adaptive TVI function based on the measurements of Naka and Rushton [152].

3.2.7 Adaptive Logarithmic Mapping


Drago et al. [60] presented a global operator based on logarithmic mapping.
The authors pointed out how an HDR image tone mapped with logarithm
operators using dierent bases led to dierent results. Figure 3.10 demonstrates the dierences of an image tone mapped with logarithm function
using base 2 and using base 10. While logarithm with base 10 allows maximum compression of high luminance values, logarithm with base 2 provides
good contrast and preserves details in areas of dark and medium luminance.
It is interesting to notice that neither of these two images provide a satisfying result. Based on these results, Drago et al. have proposed a tone
mapping operator that can combine the results of two such images through

3.2. Global Operators

57

(a)

(b)

Figure 3.10. Results of logarithmic mapping using two dierent bases. (a) With
base 2. (b) With base 10.

adaptive adjustment of logarithmic bases depending on the input pixels


radiance.
A basic property of the logarithmic function allows an arbitrary choice
of logarithmic base as follows:
logbase (x) =

logd (x)
,
logd (base)

(3.20)

and a smooth interpolation between logarithmic bases is performed making


use of the Perlin and Hoer bias power function [171]:
biasb (x) = tlog(b)/ log(0.5) .

(3.21)

The tone mapping function is nally derived by inserting the Equation (3.21) into the denominator of Equation (3.20):
Ld (x) =

Ld, max
100 log10 (1 + Lw, max )

log(1 + Lw (x))
,


Lw (x)
log 2 + 8 Lw, max

log(b)
,
log(0.5)

(3.22)
where b [0, 1] is a user parameter that adjusts the compression of high
values and visibility of details in dark areas. A suggested value is b equal
to 0.85; Ld, max is the maximum luminance of the display device, where

58

3. Tone Mapping

(a)

(b)

(c)

(d)

Figure 3.11. An example of the adaptive logarithmic mapping, varying p. (a) p =


0.65. (b) p = 0.75. (c) p = 0.85. (d) p = 0.95.

a common value for an LDR display is 100 cd/m2 (see Figure 3.11). The
required luminance values are Lw (x) (world luminance) and the maximum
luminance of the scene Lw, max , which need to be scaled by the world luminance adaptation Lw, a and an optional exposure factor. Finally, gamma

3.2. Global Operators

59

if (~ exist ( Drago_Ld_Max ) )
D r a g o _ L d _ M a x =100;
end
if (~ exist ( Drago_b ) )
Drago_b =0.85;
end
% Max luminance
LMax = max ( max ( L ) ) ;
constant = log ( Drago_b ) / log (0.5) ;
costant2 =( D r a g o _ L d _ M a x /100) /( log10 (1+ LMax ) ) ;
Ld = costant2 * log (1+ L ) ./ log (2+8*(( L / LMax ) .^ constant ) ) ;

Listing 3.16. Matlab Code: Drago et al. TMO [60].

correction is applied on the tone mapped data to compensate for the nonlinearity of the display device. This is achieved deriving a transfer function
based on the ITU-R BT.709 standard. The TMO provides a computationally fast mapping that allows a good global range compression. However,
its global nature, as with other global methods, does not always allow for
the preservation of fine details.
Listing 3.16 provides the Matlab code of Drago et al.s [60] operator.
The full code can be found in the file DragoTMO.m. The method takes as
input the maximum luminance of the display device, Drago Ld Max, and
the bias parameter, Drago b. The first step is to extract the maximum
luminance Lmax from the input image, then two constants are computed
that will be used in the final tone mapping function. The first variable
constant corresponds to the parameter of Equation (3.22) or the exponent of the bias power function in Equation (3.21). The second constant,
constant2, is the left part of Equation (3.22). Finally, Equation (3.22) is
applied to the luminance channel L.

3.2.8 Encoding of High Dynamic Range Video with a Model of


Human Cones
Van Hateren [80] proposed a new tone mapping operator based on a model
of human cones [79], which can be inverted to encode HDR images and
videos. The TMO and its inverse work in troland units (td), which represent the measure of retinal illuminance I, which in turn is derived from
the scene luminance in cd/m2 multiplied by the pupil area in mm2 . Van
Hateren proposed a temporal and a static version of this TMO.

60

3. Tone Mapping

Figure 3.12. The pipeline for range compression (green) and range expansion
(red) proposed by Van Hateren [80].

The temporal TMO is designed for HDR videos and presents low-pass
temporal lters for removing photon and source noise (see Figure 3.12).
The TMO starts by simulating the absorption of I by visual pigment,
which is modeled by two low-pass temporal lters and described in terms
of a dierential equation:
1
1
dy
+ y = x,
dt

where is a time constant, and x(t) and y(t) are the input and the output
respectively at time t. At this point, a strong nonlinear function is applied
to the result of low-pass lters E for simulating the breakdown of cyclic
guanosine monophosphate (cGMP) by enzymes (cGMP is a nucleotide that
controls the current across the cell membranes):
=

1
1
= (c + k E ) ,

where k E is the light-dependent activity of an enzyme, and c the residual activity. The breakdown of cGMP is counteracted by the production of
cGMP, a highly nonlinear feedback loop under control of intercellular calcium. This system is modeled by a ltering loop that outputs the current
across the cell membrane, Ios (the nal tone mapped value), by the outer
segment of a cone.
Van Hateren showed that the range expansion is quite straightforward
by inverting the feedback loop. However, the process cannot be fully inverted because the rst two low-pass lters are dicult to invert, so the
result is I  I. In order to fully invert the process for inverse tone mapping
purposes, Van Hateren proposed a steady version of the TMOa global
TMOdened as
Ios =

1
,
(1 + (aC Ios )4 )(c + k I)

3.3. Local Operators

(a)

61

(b)

(c)

Figure 3.13. An example of Van Haterens algorithm applied to the HDR Bottles image. (a) Original f-stop 0. (b) Tone mapped frame using cone model.
(c) Reconstructed frame using proposed iTMO.

which can be easily inverted as


#
$
1
1
c .
I=
k Ios (1 + (aC Ios )4 )
Van Hateren applied this TMO and its inverse to uncalibrated HDR
movies and images, which were scaled by the logarithmic mean. The results
showed that the method does not need gamma correction, removes noise,
and presents light adaptation (see Figure 3.13). The main drawbacks of
the TMO are the introduction of motion blur in movies, and the limited
dynamic range that it can handle (10, 000 : 1), which causes saturation
in very dark and bright regions. Finally, the author does not provide any
study on companding and further quantization for his TMO/iTMO.

3.3

Local Operators

Local operators improve the quality of the tone mapped image over global
operators by attempting to reproduce both the local and the global contrast. This is achieved by having f , the mapping operator, take into account the intensity values from the neighboring pixels of the pixel being
tone mapped. However, neighbors have to be chosen carefully; otherwise,
halos around edges can appear. Halos are sometimes desired when attention needs to be given to a particular area [128], but if the phenomenon is
uncontrolled it can produce unpleasant images.

3.3.1 Spatially Nonuniform Scaling


Chiu et al. [35] proposed the rst operator that attempted to preserve
local contrast. The TMO scales the world luminance by the average of

62

3. Tone Mapping

(a)

(b)

(c)

(d)

Figure 3.14. An example of the local TMO introduced by Chiu et al. [35] applied
to the Stanford Memorial Church HDR image. (a) The simple operator with
= 3. While local contrast is preserved, the global contrast is completely lost,
resulting in a flat appearance of the tone mapped image. (b) The simple operator
with = 27. In this case both local and global contrast are kept, but halos are
quite extensive in the image. (c) The TMO with clamping and = 27. Halos are
reduced but not completely removed. (d) The full TMO with glare simulation
and = 27; note that the glare masks halos. (The original HDR image is courtesy
of Paul Debevec [50].)

neighboring pixels, which is defined as


Ld (x) = Lw (x)s(x),

(3.23)

where s(x) is the scaling function that is used to compute the local average
of the neighboring pixels, defined as
 
1
s(x) = k Lw G )(x)
,
where G is a Gaussian filter and k is a constant that scales the final
output. One issue with this operator is that while a small value produces
a very low contrast image (see Figure 3.14(a)), a high value generates
halos in the image (see Figure 3.14(b)). Halos are caused at edges between
very bright areas and very dark ones, which means that s(x) > Lw (x)1 .
To alleviate this, pixel values are clamped to Lw (x)1 if s(x) > Lw (x)1 .
At this point s can still have artifacts in the form of steep gradients where
s(x) = Lw (x)1 . A solution is to smooth s iteratively with a 33 Gaussian
filter (see Figure 3.14(d)). Finally, the operator masks the remaining halo
artifacts simulating glare, which is modeled by a low-pass filter.
The operator presents the first local solution, but it is quite computationally expensive for alleviating halos (around 1,000 iterations for the
smoothing step). There are many parameters that need to be tuned, and

3.3. Local Operators

63

% default parameter s
if (~ exist ( k ) |~ exist ( sigma ) |~ exist ( clamping ) |~ exist ( glare
) |~ exist ( glare_n ) )
k =8;
[r ,c , col ]= size ( img ) ;
sigma = round (16* max ([ r , c ]) /1024) +1;
clamping =500;
glare =0.8;
glare_n =8;
glare_wid th =121;
end
% Check parameter s
if (k <=0) k =8; end
if ( sigma <=0) sigma = round (16* max ([ r , c ]) /1024) +1; end
% Calculat i ng S
blurred = R e m o v e S p e c i a l s (1./( k * G a u s s i a n F i l t e r (L , sigma ) ) ) ;
% Clamping S
if ( clamping >0)
iL = R e m o v e S p e c i a l s (1./ L ) ;
indx = find ( blurred >= iL ) ;
blurred ( indx ) = iL ( indx ) ;
% Smoothing S
H2 =[0.080 ,0.113 ,0.080;...
0.113 ,0.227 ,0.113;...
0.080 ,0.113 ,0.080];
for i =1: clamping
blurred = imfilter ( blurred , H2 , replicate ) ;
end
end
% Dynamic range reduction
Ld = L .* blurred ;

Listing 3.17. Matlab Code: Chiu et al.s operator [35].

nally halos are reduced but not completely removed by the clamping,
smoothing step, and glare.
Listing 3.17 provides the Matlab code of Chiu et al.s operator [35].
The full code may be found in the le ChiuTMO.m. The method takes as
input the parameters of the scaling function, such as the scaling factor k;
sigma that represents the standard deviation of the Gaussian lter G ;
clamping that is the number of iterations for reducing the halo artifacts;
the parameters for the glare ltering, such as glare, that is a constant
factor; and the exponent of the glare lter glare n. After verifying userset parameters, the rst step is to prepare the Gaussian lter H (G in
Equation (3.23)) and apply it on the HDR luminance, L, of the input

64

3. Tone Mapping

image, img. The reciprocal of the ltering result is stored in the variable
blurred. In the case where the variable clamping is higher than 0, this
means that the iterative process to reduce the halo artifacts is required.
This is performed only on the pixels that are still not clamped to 1, after
the previous ltering process (smoothing constraint). indx stores the index
of the pixels in the blurred input HDR luminance that are still above 1.
In order to respect the smoothing constraint, we substitute in blurred the
pixels with index, indx, with the values of the variable iL with the same
index, indx.
The variable iL stores the inverted values of the input HDR luminance, which correspond to the output of the rst ltering step in case
s(x) > Lw (x)1 . In this way only these pixel values will be ltered iteratively. The smoothing step is nalized by applying the lter H2 on
the updated blurred variable. Once the scaling function is computed and
stored in blurred, the dynamic range reduction is obtained applying Equation (3.23). Glare is computed if the glare constant factor is higher than 0.
This is performed in an empirical way where only the blooming eect is
considered (Listing 3.18).
The idea is that a pixel in a ltered image should retain some constant factor glare of the original luminance value, where glare is less
than 1. The remaining 1-glare is a weighted average of surrounding pixels
where adjacent pixels are contributing more. This is performed with a root
square lter stored in H3. The used width of the lter is the default value
used in the original paper [121]. Finally, to take the glare into account,
the lter in Listing 3.18 is applied on the reduced dynamic range stored
in Ld.
if ( glare >0)
% Calculati o n of a kernel with a Square Root shape for
simulati ng glare
window2 = round ( glare_wid th /2) ;
[x , y ]= meshgrid ( -1:1/ window2 :1 , -1:1/ window2 :1) ;
H3 =(1 - glare ) *( abs ( sqrt ( x .^2+ y .^2) -1) ) .^ glare_n ;
H3 ( window2 +1 , window2 +1) =0;
% Circle of confusion of the kernel
H3 ( find ( sqrt ( x .^2+ y .^2) >1) ) =0;
% N o r m a l i s a t i o n of the kernel
H3 = H3 / sum ( sum ( H3 ) ) ;
H3 ( window2 +1 , window2 +1) = glare ;
% Filtering
Ld = imfilter ( Ld , H3 , replicate ) ;
end

Listing 3.18. Matlab Code: Glare implementation of Chiu et al. [35].

3.3. Local Operators

65

3.3.2 A Multiscale Model of Adaptation and Spatial Vision for


Realistic Image Display
Pattanaik et al. [167] proposed the rst spatially varying color appearance
model (CAM) for the reproduction of HDR images. This CAM is based on
previous psychophysical research arranged in a coherent framework. The
key fact here, of interest to us, is that the visual processing can be described
by ltering at dierent scales [235].
The algorithm processes images in the long, middle, and short (LMS)
wavelength cones and rods responses (R). The cones and rods responses are
processed spatially. Firstly, the image is decomposed into a stack of seven
Gaussian ltered images. Secondly, for each channel, a stack of dierence of
Gaussian (DoG) images is computed. Gain control functions (G) for cones
and rods are applied at each level, which converts the contrast to adapted
contrast. Then, a transducer function, T , is applied to each adapted DoG
to model psychophysically-derived human spatial contrast sensitivity functions as described in Watson and Solomon [226]. Finally, chromatic and
achromatic channels are calculated following the Hunt Model [88]. In order
to map the model on an LDR display, T and G are inverted to the viewing
condition of a target display. Subsequently, they are applied respectively
to the LMS components of the image. T is applied rst, followed by G.

(a)

(b)

Figure 3.15. An example of the multiscale model by Pattanaik et al. [167] applied
to a scene varying the scale. (a) 4 pixels size kernel. (b) 64 pixels size kernel.

66

3. Tone Mapping

Finally, the image is reconstructed, combining the stack of images. Note


that the range reduction is performed by function G and T .
The multiscale observer model is a CAM model designed for dynamic
range reduction. The model, due to its local nature, can simulate many
aspects of the HVS, such as visual acuity, change in threshold visibility,
color discrimination, and colorfulness. An example of the results of the
operator is shown in Figure 3.15. However, halos can occur if the starting
kernel for the decomposition is not chosen accurately. Furthermore, the
model is computationally expensive due to the high number of ltering
operations and intermediate states of processing that are needed. Similar
algorithms to the multiscale observer model are multiscale methods based
on Retinex theory [108], such as Rahams work [177], and an extension to
HDR imaging [144] (see Section 3.3.5).

3.3.3 Photographic Tone Reproduction


A local operator based on photographic principles was presented by Reinhard et al. [180]. This method simulates the burning and dodge eect that
photographers have applied for more than a century. In particular, the
operator is inspired by the Zonal System presented by Adams [3].
The global component of the operator is a function that mainly compresses high luminance values
Ld (x) =

Lm (x)
,
1 + Lm (x)

(3.24)

where Lm is the original luminance scaled by aL1


w, H , and a is the chosen
exposure for developing the lm in the photographic analogy. Lw, H is the
logarithmic average that is an approximation of the scene key value. The
key value indicates subjectively if the scene is light, normal, or dark and
is used in the zone system for predicting how scene luminance will map to
a set of print zones [180]. Note that in Equation (3.24), while high values
are compressed, others are scaled linearly. However, Equation (3.24) does
not allow bright areas to be burnt out, as a photographer could do during
the developing of a lm for enhancing contrast. Therefore, Equation (3.24)
can be modied as seen in Equation (3.25):

Lm (x) 1 + L2
white Lm (x)
Ld (x) =
.
(3.25)
1 + Lm (x)
The value Lwhite is the smallest luminance value that is mapped to white
and is equal to Lm, max by default. If Lwhite < Lm, max , values that are
greater than Lwhite are clamped (burnt in the photography analogy).

3.3. Local Operators

67

A local operator can be dened for Equation (3.24) and Equation (3.25).
This is achieved by nding the largest local area without sharp edges, thus
avoiding halo artifacts. This area can be detected by comparing dierentsized Gaussian ltered Lm images. If the dierence is very small or tends
to zero, there is no edge; otherwise there is. The comparison is dened as
%
%
% L (x) L + 1 (x) %
%
%
(3.26)
% 2 a 2 + L (x) % ,
where L (x) = (Lm G )(x) is a Gaussian ltered image at scale , and
 is a small value greater than zero. Note that the ltered images are normalized so as to be independent of absolute values, the term 2 a 2 avoids
singularities, and a and are the key value and the sharpening parameter,
respectively. Once the largest (max ) that satises Equation (3.26) is
calculated for each pixel, the global operators can be modied to be local.
For example, Equation (3.24) is modied as
Ld (x) =

(a)

Lm (x)
,
1 + Lmax (x)

(3.27)

(b)

Figure 3.16. An example of photographic tone reproduction operator by Reinhard


et al. [180]. (a) The local operator Equation (3.28) with = 4, = 0.05,
and Lwhite = 106 cd/m2 . (b) The local operator Equation (3.28) with = 4,
= 0.05, and Lwhite has a similar value of the window luminance. Note that this
setting burns out the window allowing more control to photographers.

68

3. Tone Mapping

and similarly for Equation (3.25),




L
(x)
Lm (x) 1 + L2
m
white
Ld (x) =

1 + Lmax (x)

(3.28)

where Lmax (x) is the average luminance computed over the largest neighborhood (max ) around the image pixel. An example of Equation (3.28)
can be seen in Figure 3.16 where the burning parameter Lwhite is varied.
The photographic tone reproduction operator is a local operator that
preserves edges, avoiding halo artifacts. Another advantage is that it does
not need to have calibrated images as an input.
Listing 3.19, Listing 3.20, and Listing 3.21 provide the Matlab code
of the Reinhard et al. [180] TMO. The full code may be found in the le
ReinhardTMO.m. The method takes as input the parameters pAlpha, which
is the value of exposure of the image a, the smallest luminance that will
be mapped to pure white pWhite corresponding to Lwhite , a Boolean value
pLocal to decide which operator to apply (0 - global or 1 - local), and the
sharpening parameter phi corresponding to as input.
The rst part of the code computes the scaling luminance step (Listing 3.19). First, the user-set input parameters are veried. Afterwards,
the luminance is read from the HDR input image and the logarithmic average is computed and stored in Lwa. Finally, the luminance is scaled and
stored in L.
The local step is performed in case the Boolean pLocal variable is
set to 1. The scaled luminance, L, is ltered using the Matlab function
ReinhardGaussianFilter.m and the condition in Equation (3.26) is used
to identify the scale sMax (that represents max ) that contains the largest
neighborhood around a pixel. Finally, L adapt stores the value of Lmax (x).
if (~ exist ( pWhite ) ||~ exist ( pAlpha ) ||~ exist ( pLocal ) ||~ exist
( phi ) )
pWhite =1 e20 ;
pAlpha =0.18;
pLocal =1;
phi =8;
end
% Logarith m ic mean calculat i on
Lwa = logMean ( img ) ;
% Scale luminance using alpha and logarith m ic mean
L =( pAlpha * L ) / Lwa ;

Listing 3.19. Matlab Code: Scaling luminance component of Reinhard et al.


TMO [180].

3.3. Local Operators

69

if ( pLocal )
% p r e c o m p u t a t i o n of 9 filtered images
sMax =9;
[r , c ]= size ( L ) ;
Lfiltered = zeros (r ,c , sMax ) ;
LC = zeros (r ,c , sMax ) ;
alpha1 =1/(2* sqrt (2) ) ;
alpha2 = alpha1 *1.6;
constant =(2^ phi ) * pAlpha ;
sizeWindow =1;
for i =1: sMax
s = round ( sizeWindow ) ;
V1 = R e i n h a r d G a u s s i a n F i l t e r (L ,s , alpha1 ) ;
V2 = R e i n h a r d G a u s s i a n F i l t e r (L ,s , alpha2 ) ;
% normalize d differenc e of Gaussian levels
LC (: ,: , i ) = R e m o v e S p e c i a l s (( V1 - V2 ) ./( constant /( s ^2) + V1 ) ) ;
Lfiltered (: ,: , i ) = V1 ;
sizeWindow = sizeWindow *1.6;
end
% threshold is a constant for solving the band - limited
% local contrast LC at a given image location .
epsilon =0.0001;
% adaptation image
L_adapt = L ;
for i = sMax : -1:1
ind = find ( LC (: ,: , i ) < epsilon ) ;
if (~ isempty ( ind ) )
L_adapt ( ind ) = Lfiltered ( r * c *( i -1) + ind ) ;
end
end
end

Listing 3.20. Matlab Code: Local step of Reinhard et al. TMO [180].

In the nal step, the pWhite is set to the maximum luminance of the
HDR input image scaled by aL1
w, H and the nal compression of the dynamic range is performed. This is equivalent to Equation (3.25) or Equation (3.27) depending on whether the global or local operator is used.
pWhite2 = pWhite * pWhite ;
% Range compress i on
if ( pLocal )
Ld = L ./(1+ L_adapt ) ;
else
Ld =( L .*(1+ L / pWhite2 ) ) ./(1+ L ) ;
end

Listing 3.21. Matlab Code: Last step of Reinhard et al. TMO [180].

70

3. Tone Mapping

3.3.4 Tone Mapping Algorithm for High Contrast Images


An operator that has a similar mechanism to the Photographic Tone Reproduction operator, in terms of preserving edges and avoiding halos, has
been presented by Ashikhmin [17]. Ashikhmin proposed two dynamic range
compression equations depending on whether the nal goal is preserving
local contrast or preserving visual contrast.
In the case when preserving the local contrast is the main goal, Equation (3.29) is used:
Ld (x) =

Lw (x)f (Lw, a (x))


,
Lw, a (x)

(3.29)

where f is the tone mapping function, Lw, a (x) is the local luminance
adaptation, and Lw (x) is the luminance for the pixel location x. When
preserving the visual contrast is the goal, Equation (3.30) is used:
Ld (x) = f (Lw, a (x)) +

T V I(f (Lw, a (x)))


(Lw (x) Lw, a (x)) ,
T V I(Lw, a (x))

where T V I is a simplied threshold vs.


Equation (3.31):
x

0.0014

2.4483 + log( x )/0.4027


0.0034
C(x) =
x1.0

16.5630
+

0.4027

x
32.0693 + log( 7.2444
)/0.0556

(3.30)

intensities function as seen in


if x 0.0034,
if 0.0034 x 1.0,
if 1.0 x 7.2444,

(3.31)

otherwise,

where x is a luminance value in cd/m2 . The tone mapping function f is


based on the principle that perceptual scale has to be uniform. Therefore, world luminance is mapped into display luminance according to their
relative position in corresponding perceptual scales [17]:
Ld (x) = f (Lw (x)) = Ld, max

C(Lw (x)) C(Lw, min )


,
C(Lw, max ) C(Lw, min )

(3.32)

where Ld, max is the maximum luminance of the display device (usually
100 cd/m2 ). The estimation of the local adaptation luminance Lw, a (x) is
based on the principle that balances the two requirements, such as keeping
the local contrast signal with reasonable bound while maintaining enough
information about image details [17]. This principle leads to averaging over
the largest neighborhood that is suciently uniform without generating
excessive contrast signals (visualized as artifacts). To identify a uniform
neighborhood, increasing its size must not signicantly aect its average

3.3. Local Operators

(a)

71

(b)

Figure 3.17. A comparison between the TMO by Reinhard et al. [180] and the
one by Ashikhmin [17] applied to the Bottles HDR image. (a) The local operator
of Reinhard et al. [180]. (b) The local operator of Ashikhmin [17]. Note that
details are similarly preserved in both images; the main dierence is in the global
tone function.

value. This measure is the local contrast computed at a specic location lc


and it is taken as the ratio of the dierence of two low-pass ltered images
to one of them:
Gs (L)(x) G2s (L)(x)
lc (s, x) =
,
Gs (L)(x)
where Gs (L) is the output of applying a Gaussian lter of width s to the
input image. The neighborhood criteria is to nd at every pixel the smallest
s [1, smax ] that solves the equation |lc (s)| = 0.5. The value of smax should
be set to one degree of visual eld as found in psychophysical experiments;
however, the authors found that a reasonable xed value for the maximum
lter size, such as smax = 10 pixels, is usually adequate.
It is interesting to note the similarities of this operator with the one
proposed by Reinhard et al. [180]. In particular, the similarity is in the
philosophy behind the identication of the largest neighborhood on the
pixel for the local luminance adaptation. Figure 3.17 shows two images
tone mapped with the two operators.
Listing 3.22 and Listing 3.23 provide the Matlab code of the Ashikhmin
TMO [17]. The full code can be found in the le AshikhminTMO.m.
if (~ exist ( pLocal ) )
pLocal =1;
end
if (~ exist ( LdMax ) )
LdMax =100;
end

72

3. Tone Mapping

% Local calculat i on ?
if ( pLocal )
% precomput e 10 filtered images
sMax =10;
% sMax should be one degree of visual angle , the
value is set as in the original paper
[r , c ]= size ( L ) ;
Lfiltered = zeros (r ,c , sMax ) ; % filtered images
LC = zeros (r , c , sMax ) ;
for i =1: sMax
Lfiltered (: ,: , i ) = G a u s s i a n F i l t e r W i n d o w (L , i +1) ;
% normalize d difference of Gaussian levels
LC (: ,: , i ) = R e m o v e S p e c i a l s ( abs ( Lfiltered (: ,: , i ) G a u s s i a n F i l t e r W i n d o w (L ,( i +1) *2) ) ./ Lfiltered (: ,: , i ) ) ;
end
% threshold is a constant for solving the band - limited
% local contrast LC at a given image location .
threshold =0.5;
% adaptatio n image
L_adapt = - ones ( size ( L ) ) ;
for i =1: sMax
ind = find ( LC (: ,: , i ) < threshold ) ;
L_adapt ( ind ) = Lfiltered ( r * c *( i -1) + ind ) ;
end
% set the maximum level
ind = find ( L_adapt <0) ;
L_adapt ( ind ) = Lfiltered ( r * c *( sMax -1) + ind ) ;
% Remove the detail layer
Ldetail = R e m o v e S p e c i a l s ( L ./ L_adapt ) ;
L = L_adapt ;
end

Listing 3.22. Matlab Code: Local step of Ashikhmin TMO [17].

The method takes the parameters LdMax (Ld, max ), which is the maximum
luminance of the display device and a Boolean variable, pLocal, which
is used to identify if the local adaptation luminance estimation must be
computed so as to apply the local behavior of the TMO. After having
veried if the input parameters are given by the user, the local luminance
adaptation is computed. This consists of applying a Gaussian lter to
the HDR input luminance of a dierent size from s = 1 to s = smax and
computing the local contrast and storing it in the variable LC. This is done
in the rst for loop inside the conditional for the local computation in
Listing 3.22. The Matlab function GaussianFilterWindow.m is used for
the application of the Gaussian lter onto the input HDR luminance with
the luminance and the lter size passed as input. This Matlab function

3.3. Local Operators

73

% Roboust maximum and minimum


maxL = MaxQuart (L ,0.9995) ;
minL = MaxQuart (L ,0.0005) ;
% Range compress i on
maxL_TVI = T V I _ A s h i k h m i n ( maxL ) ;
minL_TVI = T V I _ A s h i k h m i n ( minL ) ;
Ld = LdMax *( T V I _ A s h i k h m i n ( L ) - minL_TVI ) /( maxL_TVI - minL_TVI ) ;
% Local R e c o m b i n a t i o n
if ( pLocal )
Ld = Ld .* Ldetail ;
end

Listing 3.23. Matlab Code: Tone mapping curve of Ashikhmin TMO [17].

can be found in the Tmo/util folder. The RemoveSpecials.m Matlab


function is used to remove special values that may be generated during the
computation of the local contrast LC. The local luminance adaptation starts
with the computation of the local contrast LC, where sMax (smax ) Gaussian
levels are computed and the Equation (3.29) is applied at each level (rst
for loop). Once the local contrast LC is computed, at each level i,
the index of the pixel in LC that is below the threshold is computed. The
corresponding element of the ltered image with a Gaussian lter computed
previously during the local contrast computation Lfiltered is taken as
luminance adaptation L adapt. After this step, a verication is conducted
to check whether some elements of L adapt have not been modied. Since
L adapt had been initialized to 1, this is veried by testing if L adapt
elements are below 0. For these elements, the corresponding element of
the ltered image at the maximum level is assigned. Afterwards, the input
luminance L is divided by the luminance adaptation L adapt, which is
equivalent to removing the details of the input HDR image L details.
The luminance adaptation L adapt is afterwards stored in L for further
use.
Once the local luminance adaptation has been computed and stored in
L, the tone curve (Listing 3.23) is applied. The rst step is to extract the
maximum (maxL) and minimum (minL) luminance from the local luminance
adaptation. For this step the Matlab function MaxQuart.m, which can be
found in the util folder, is used. Afterwards, the TVI Ashikhmin function
is applied on the maximum and minimum luminance using the Matlab
function TVI Ashikhmin.m, which can be found within the Tmo/util folder.
The variable Ld stores the result of the tone mapping curve, as in Equation (3.32). Finally, the details are merged back in the case that the TMO
used is the local one.

74

3. Tone Mapping

3.3.5 A Retinex-Based Operator


Meylan et al. [144] have proposed a Retinex-based adaptive lter to be
applied in the context of tone mapping. Figure 3.18 shows the ow chart
of the method proposed by Meylan et al. [144]. The method uses parallel processing to process luminance and chrominance separately. Principal
component analysis (PCA) is applied to the input image to decorrelate the
RGB channels into three principal components. The rst component is carrying the luminance information (achromatic channel). This is justied
by the fact that the PCA intrinsically leads to an opponent representation
of colors [144].
A global compression is carried out on the luminance and on the
linearized version of the RGB image I. This is similar to the adaptation of
photoreceptors that is approximated by a power function, where its slope
depends on the mean luminance in the eld of view as described in [15].
Afterwards, the Retinex-based adaptive lter (see Appendix B) is applied
on the globally corrected luminance  in the logarithmic domain. Instead,
on the globally corrected RGB image I  a logarithm is applied. The Retinex
lter consists of the traditional surround-based methods, where a new value
is computed from the dierence in the logarithmic domain between the
pixel under consideration and the value of a mask. The mask is a weighted
average of the pixels under consideration surrounding an area. To overcome
the typical drawback of surround-based methods, such as the inability to
keep pure black and white for low contrast areas, the authors introduced
a weighting factor that weights the mask and ensures the conservation of
white and black areas. The weighting factor is a typical sigmoid function.
The use of an adaptive lter allows the avoidance of the usual halo
artifacts typical of surround-based Retinex methods. To merge back the

Figure 3.18. Flow chart of the Retinex-based adaptive lter TMO introduced by
Meylan et al. [144].

3.4. Frequency-Based Operators

75

Figure 3.19. An example of the Retinex operator by Meylan et al. [144] using
color enhancement.

color information, the PCA is applied to the log-encoded image I  and the
principal component is replaced by the new luminance value new obtained
as output of the Retinex adapted lter method. Then, the chrominance
channels are weighted by a factor that helps to compensate for the loss of
saturation that is partially generated by working in the logarithmic domain.
Since this operation is similar for all images, the authors suggested using
= 1.6, which was found to be suitable during their experiments. An
example of the results of this operator is shown in Figure 3.19.

3.4

Frequency-Based Operators

Frequency-based operators have the same goal of preserving edges and local
contrast as local operators. In the case of frequency operators, as the name
implies, this is achieved by computing in the frequency domain instead of
the spatial domain. The main observation for such methods is that edges
and local contrast are preserved if and only if a complete separation between
large features and details is achieved.

3.4.1 Low Curvature Image Simplifier Operator


Tumblin and Turk tried to solve the problem of preserving local contrast
without the loss of ne details and texture by replicating the technique
used by an artist when painting a scene [203]. In general, an artist would,
when drawing a scene, start with a sketch of large features (i.e., boundaries
around most important large features) and gradually rene these large features, adding more details (i.e., adding more shading and boundaries). The

76

3. Tone Mapping

(a)

(b)

Figure 3.20. Comparison between a band pass and LCIS lters. (a) The band
pass lter does not completely separate ne details from large features. (b) LCIS
avoids this problem and artifacts are not generated.

proposed algorithm builds a similar hierarchy by making use of a partial


dierential equation inspired by anisotropic diusion called low curvature
image simplifier (LCIS) to attempt to compress only the largest features
such that the ne details will be preserved.
In Figure 3.20, the LCIS hierarchy lter is compared to a linear hierarchy lter. The linear lter (see Figure 3.20(a)) is unable to separate the
large features from the ne details. As a consequence of this, some large
features may not be compressed because they mixed with the ner details,
resulting in halos and characterized by gradient reversal artifacts. The
LCIS approach of being able to separate the large features from the ne
details is able to avoid this problem (see Figure 3.20(b)). LCIS removes
the details, leaving the smoothly shaded regions separated by sharp boundaries (we may call this new image simplified image). The details can be
recovered by subtracting the simplied image from the original HDR input
image. At this point, detailed regions are completely separated from large
feature regions. These last can be now compressed. After the compression
step, the details are added back with gentle or no compression.
In Figure 3.21, the LCIS hierarchy (multiscale approach) used for the
detail-preserving contrast method implemented by Tumblin and Turk [203]
is shown. Preserved scene details are extracted with progressive sets of
LCIS settings. The luminance is rst converted to logarithmic scale with
base 10, so the dierence between pixels corresponds to contrast.
Afterwards, a set of LCIS lters with progressively larger K values are
applied to produce a set of simplied images. Here K = 0 means that
the LCIS has no eect on the input image, and the simplied image with
highest K is the simplest image. The details at each level (det0 , . . . , det2 )
are obtained by subtracting the simplied image from level i with that
at level i 1. The details layers and the base are then compressed and

3.4. Frequency-Based Operators

77

Figure 3.21. The LCIS hierarchy used to reduce the contrast and preserve details
in [203]. (The original HDR image is courtesy of Gregory J. Ward.)

multiplied by a set of weights and added to an exponential to get the output


intensities. This TMO is the rst one that attempts the separation between
large features and details to avoid artifacts in the output image. One of
the major drawbacks is the time required to tweak the input parameters
such as K, weights w, and time-steps.

3.4.2 Fast Bilateral Filtering Operator


The bilateral lter (see Appendix A) is a nonlinear lter that can separate
an image into a high frequency image, called detail layer, and a low frequency image with preserved edges, called base layer. Durand and Dorsey
exploited this property when designing a general and ecient tone mapping
framework [62], which preserves local contrast and is inspired by Tumblin
and Turks work on LCIS [203].
The pipeline of this method can be seen in Figure 3.22. The rst step
of this framework is to decompose an HDR image into luminance and chromaticity. At this point, the luminance in the logarithmic domain is ltered
with the bilateral lter, and the detail layer is calculated by dividing the
luminance by the ltered one. The ltered luminance is subsequently tone
mapped using a global TMO. The authors used Tumblin and Rushmeiers
operator [202], but any TMO is suitable for the range reduction. Finally,

78

3. Tone Mapping

Figure 3.22. The pipeline of the fast bilateral ltering operator.

the tone mapped base layer, the detail layer, and the chromaticity are
recombined to form the nal tone mapped image.
Durand and Dorsey presented a speed-up of the ltering using an approximated bilateral lter and down-sampling. However, this technique has
been made obsolete after the introduction of new acceleration methods for
the bilateral lter (see Appendix A). The framework can preserve most of
the ne details and can be applied to any global TMO. Figure 3.23 provides
an example. A problem associated with this method is that halos are not
completely removed. An improvement to this framework was proposed by

(a)

(b)

Figure 3.23. A comparison of tone mapping with and without using the framework proposed by Durand and Dorsey [62] applied to the Bottles HDR image.
(a) Tumblin and Rushmeiers operator. (b) The bilateral framework using Tumblin and Rushmeiers operator for the compression of the base layer; note that
ne details are enhanced.

3.4. Frequency-Based Operators

79

% default parameter s
if (~ exist ( Lda ) |~ exist ( CMax ) )
Lda =80;
CMax =100;
end
% Chroma
for i =1:3
img (: ,: , i ) = R e m o v e S p e c i a l s ( img (: ,: , i ) ./ L ) ;
end
% Fine details and base separatio n
[ Lbase , Ldetail ]= B i l a t e r a l S e p a r a t i o n ( L ) ;
% Tumblin - Rushmeier TMO
for i =1:3
img (: ,: , i ) = img (: ,: , i ) .* Lbase ;
end
imgOut = T u m b l i n R u s h m e i e r T M O ( img , Lda , CMax ) ;
% Adding details back
for i =1:3
imgOut (: ,: , i ) = imgOut (: ,: , i ) .* Ldetail ;
end

Listing 3.24. Matlab Code: Bilateral ltering [62] (TMO used is the Tumblin
and Rushmeier [202] technique).

Choundhury and Tumblin [36], where they employed the trilateral lter for
the tone mapping task (Appendix A, Figure 3.24).
Listing 3.24 provides the Matlab code of the bilateral ltering [62].
The full code may be found in the le DurandTMO.m. The method takes the
parameters Lda, which is the luminance adaptation of the display device
and the maximum contrast CMax as input. These two parameters are the
parameters requested by the TMO used to compress the high frequency
layer. In the implementation provided in this book we have used the Tumblin and Rushmeier [202] technique as described in the original Durand and
Dorsey paper [62].
The rst step is to store the color ratio that will be reused afterwards
to restore the color information to the tone mapped image. This is stored
in the img variable for the all three color components. Afterwards, the
core of the bilateral ltering is performed, that is, the separation between
high (base) and low (details) image frequencies. This is done making use
of the Matlab function BilateralSeparation.m, which can be found in
the util folder. This is performed only on the luminance range L and the
base layer is stored in the Lbase variable and the detail layer in Ldetails.
At this point the Tumblin and Rushmeier [202] TMO is applied and the

80

3. Tone Mapping

(a)

(b)

Figure 3.24. A comparison between the fast bilateral ltering [62] and trilateral
ltering [36] applied to the Mansion HDR image. (a) The image tone mapped
with the bilateral lter. (b) The image tone mapped with the trilateral lter.
Note that details are better reproduced than in (a), especially in the sky.

result is stored in imgBaseTMO. Before applying the TMO step, we restore


back the color ratio to the base layer Lbase and the result is stored in
the img variable. Since the dynamic compression is performed only on the
high image frequency (base layer), we give the img as input to the TMO.
Finally, the details are added back in the last for loop in Listing 3.24.

3.4.3 Gradient Domain Compression


A dierent approach was proposed by Fattal et al. [67], where gradients
of the image are modied to achieve range compression. The main idea is
based on the fact that drastic changes in the luminance in an HDR image
are due to large magnitude gradients at a certain scale. On the other hand,
ne details depend on small magnitude gradients. Therefore, the range can
be obtained by attenuating the magnitude of large gradients while keeping
or penalizing less small gradients to preserve ne details. This method is
inspired by Horns work on recovering reectance [85].
In the one-dimensional case, this idea is quite straightforward. If an
HDR signal H(x) = log(Lw (x)) needs to be compressed, derivatives have
to be scaled by a spatially varying factor , which modies the magnitude,
only maintaining the direction. This produces a new gradient eld G(x) =
H(x) (x)), which is used to reconstruct the compressed signal as
 x
log(Ld (x)) = I(x) = C +
G(t)dt,
0

where C is an additive constant. Note that computations are performed


in the logarithmic domain because luminance, in this domain, is an approximation of the perceived brightness, and gradients calculated in the

3.4. Frequency-Based Operators

81

logarithmic domain correspond to local contrast ratios in the linear domain. This approach can be extended to the two-dimensional case; for
example, G(x) = H(x)(x). However, G is not necessarily integrable,
because there may be no I such that G = I. An alternative is to minimize
a function I whose gradients are closest to G:
 
&
&
&I(x, y) G(x, y)&2 dxdy =
  

2 
2
I(x, y)
I(x, y)
Gx (x, y)
Gy (x, y) dxdy,
x
y

which must satisfy the Euler-Lagrange equation, in accordance with the


Variational Principle:




I(x, y)
I(x, y)
Gx (x, y) + 2
Gy (x, y) = 0.
2
x
y
This equation, once it is simplied and rearranged, leads to Poissons
Equation (Equation (3.33)):
I 2 I = 0.
(3.33)
This is a linear partial dierential equation and can be solved using a full
multigrid method [175]. For , Fattal et al. proposed a multiresolution
denition because edges are contained at multiple scales. To avoid halos,
the attenuation of edges has to be propagated from the level in which it
was detected to the full resolution scale. This is achieved by rstly creating
a Gaussian pyramid (H0 , ...Hd ), where H0 is the full resolution HDR image
and Hd is at least a 32 32 HDR image. Secondly, gradients are calculated
for each level using central dierences, as shown in Equation (3.34):


Hk (x + 1, y) Hk (x 1, y) Hk (x, y + 1) Hk (x, y 1)
Hk (x, y) =
.
,
2k+1
2k+1
(3.34)
At each level k a scaling factor is dened as
k (x, y) = 1 Hk (x, y)1 ,
where is a constant-threshold, assuming 1. While magnitudes above
will be attenuated, ones below will be slightly magnied. The nal
is calculated by propagating the scaling factors k from the coarsest level
to the full resolution one (Equation (3.35)):

if i = d,
d (x, y)

(x, y) = 0 (x, y),


(3.35)
i =
U k+1 (x, y)d (x, y) if i < d,
where U is a linear up-sampling operator.

82

3. Tone Mapping

(a)

(b)

(c)

Figure 3.25. An example of tone mapping using the gradient domain operator
by Fattal et al. [67], varying parameter applied to Stanford Memorial Church
HDR image. (a) = 0.7. (b) = 0.8. (c) = 0.9. Note that by increasing ,
the overall look of the image gets darker. (The original HDR image is courtesy
of Paul Debevec [50].)

This operator preserves ne details, avoiding halo artifacts by being


based on dierential analysis (see Figure 3.25). Moreover, the operator is a
very general solution, and it can be employed for the enhancement of LDR
images, preserving shadow areas and details.
Listing 3.25 and Listing 3.26 provide the Matlab code of the gradient
domain compression TMO by Fattal et al. [67]. The full code may be found
in the le FattalTMO.m. The method takes the parameter fBeta () as
input.
if (~ exist ( fBeta ) ) fBeta =0.95; end
L = log ( Lori +1 e -6) ;
% Computer Gaussian Pyramid + Gradient
[r , c ]= size ( L ) ;
numPyr = round ( log2 ( min ([ r , c ]) ) ) - log2 (32) ;
kernelX = [0 ,0 ,0; -1 ,0 ,1;0 ,0 ,0];
kernelY = [0 ,1 ,0;0 ,0 ,0;0 , -1 ,0];
G = [[] , struct ( fx , imfilter (L , kernelX , same ) /2 , fy , imfilter
(L , kernelY , same ) /2) ];
G2 = sqrt ( G (1) . fx .^2+ G (1) . fy .^2) ;
fAlpha = 0.1* mean ( mean ( G2 ) ) ;

3.4. Frequency-Based Operators

83

% Generatio n of the pyramid


kernel =[1 ,4 ,6 ,4 ,1] *[1 ,4 ,6 ,4 ,1];
kernel = kernel / sum ( sum ( kernel ) ) ;
imgTmp = L ;
for i =1: numPyr
imgTmp = imresize ( conv2 ( imgTmp , kernel , same ) ,0.5 , bilinear )
;
Fx = imfilter ( imgTmp , kernelX , same ) /(2^( i +1) ) ;
Fy = imfilter ( imgTmp , kernelY , same ) /(2^( i +1) ) ;
G = [G , struct ( fx , Fx /(2^( i +1) ) , fy , Fy /(2^( i +1) ) ) ];
end

Listing 3.25. Matlab Code: The initialization of the Gradient Domain


Compression TMO by Fattal et al. [67].

In the first part of the code, Listing 3.25, a pyramid of gradients, G,


is built on logarithmic luminance, L. This follows Equation (3.34), where
filtering is computed using the Matlab function imfilter.m.
In the second part of the code, Listing 3.26, each level of G is scaled
using the function FattalPhi.m, which is located in the Tmo/util folder.
After scaling, each level is upsampled and added to the next one using the
Matlab function imresize.m. This process is a straightforward implementation of Equation (3.35), where the result is stored in Phi kp1. Finally, the divergence divG is calculated as the magnitude of the gradients
of the original gradients of L scaled by Phi kp1. At this point, the Poisson
% Generatio n of the Attenuati o n mask
Phi_kp1 = FattalPhi ( G ( numPyr +1) . fx , G ( numPyr +1) . fy , fAlpha ,
fBeta ) ;
for k = numPyr : -1:1
[r , c ] = size ( G ( k ) . fx ) ;
G2 = sqrt ( G ( k ) . fx .^2+ G ( k ) . fy .^2) ;
fAlpha = 0.1* mean ( mean ( G2 ) ) ;
Phi_k = FattalPhi ( G ( k ) . fx , G ( k ) . fy , fAlpha , fBeta ) ;
Phi_kp1 = imresize ( Phi_kp1 ,[ r , c ] , bilinear ) .* Phi_k ;
end
% Calculat i ng the divergenc e with backward differen c es
G = struct ( fx , G (1) . fx .* Phi_kp1 , fy , G (1) . fy .* Phi_kp1 ) ;
kernelX = [0 ,0 ,0; -1 ,1 ,0;0 ,0 ,0];
kernelY = [0 ,0 ,0;0 ,1 ,0;0 , -1 ,0];
dx = imfilter ( G (1) . fx .* Phi_kp1 , kernelX , same ) ;
dy = imfilter ( G (1) . fy .* Phi_kp1 , kernelY , same ) ;
divG = R e m o v e S p e c i a l s ( dx + dy ) ;
% Solving Poisson equation
Ld = exp ( P o i s s o n S o l v e r ( divG ) ) ;

Listing 3.26. Matlab Code: The solver part of the Gradient Domain
Compression TMO by Fattal et al. [67].

84

3. Tone Mapping

equation is solved for divG using the function PoissonSolver.m, which is


located in the Tmo/util folder. The result of the solver is exponentiated
obtaining the tone mapped luminance, Ld.

3.4.4 Compression and Companding High Dynamic Range Images


with Subband Architectures
Li et al. [119] presented a general framework for tone mapping and inverse
tone mapping of HDR images based on multiscale decomposition. While
the main goal of the algorithm is tone mapping, the framework can also
compress HDR images. A multiscale decomposition splits a signal s(x)
(one-dimensional in this case) into n subbands b1 (x), . . . , bn (x) with n lters
f1 , . . . , fn , in a way that the signal can be reconstructed as
s(x) =

n


bi (x).

i=1

Wavelets [196] and Laplacian pyramids [29] are examples of multiscale decomposition that can be used in the framework.
The main concept this method uses is based on applying a gain control
to each subband of the image to compress the range. For example, a sigmoid
expands low values and attens peaks; however, it introduces distortions
that can appear in the nal reconstructed signal. In order to avoid such
distortions, a smooth gain map inspired by neurons was proposed. The
rst step is to build an activity map, reecting the fact that the gain of
a neuron is controlled by the level of its neighbors. The activity map is
dened as
Ai (x) = G(i ) |Bi (x)|,
where G(i ) is a Gaussian kernel with i = 2i 1 , which is proportional
to i, the subbands scale. The activity map is used to calculate the gain
map, which turns gain down where activity is high and vice versa (Equation (3.36)):
1

Ai x + 
,
(3.36)
Gi (x) = p(Ai x) =
i
where [0, 1] is a compression factor, and  is the
noise level that prevents
the noise being seen. The equation i = i x Ai (x)/(M ) is the gain
control stability level, where M is the number of pixels in the image, and
i [0.1, 1] is a constant related to spatial frequency. Once the gain maps
are calculated, subbands can be modied, as in Equation (3.37):
Bi (x) = Gi (x)Bi (x).

(3.37)

3.4. Frequency-Based Operators

85

Note that it is possible to calculate a single activity map for all subbands
by pooling all activity maps as
Aag (x) =

n


Ai (x).

i=1

From Aag , a single gain map Gag = p(Aag ) is calculated for modifying
all subbands. The tone mapped image is nally obtained summing all modied subbands Bi . The compression is applied only to the V channel of an
image in the HSV color space [74]. Finally, to avoid oversaturated images,
S can be reduced by [0.5, 1]. The authors presented comparisons with
the fast bilateral lter operator [62], the photographic operator [180], and
the gradient domain operator [67] (see Figure 3.26).
The framework can be additionally used for compression, applying expansion after tone mapping. This operation is called companding. The

(a)

(b)

(c)

(d)

Figure 3.26. A comparison of tone mapping results for the Doll HDR image.
(a) The subband architecture using Wavelets. (b) The gradient domain operator.
(c) The fast bilateral lter operator. (d) The photographic operator. (Images
are courtesy of Yuanzhen Li and Edward Adelson [119].)

86

3. Tone Mapping

Figure 3.27. The optimization companding pipeline of Li et al. [119].

expansion operation Equation (3.38) is obtained using a straightforward


modication of Equation (3.37):
Bi (x) =

Bi (x)
.
Gi (x)

(3.38)

A simple companding operation is not sucient for compression, especially if the tone mapped image is compressed using lossy codecs. Therefore,
the companding operation needs to be iterative to determine the best values for the gain map (see Figure 3.27). The authors proposed compressing
the tone mapped image into JPEG. In this case a high bit rate is needed
(1.5 bpp4 bpp) with chrominance subsampling disabled to avoid the amplication of JPEG artifacts during expansion, since a simple up-sampling
strategy is adopted.

3.5

Segmentation Operators

Recently, a new approach to the tone mapping problem has emerged in the
form of segmentation operators. Strong edges and most of local contrast
perception is located along the border of large uniform regions. Segmentation operators divide the image into uniform segments, apply a global
operator at each segment, and nally merge them. One additional advantage of such a method is that gamut modications are minimized because
a linear operator for each segment sometimes is, in many cases, sucient.

3.5. Segmentation Operators

87

3.5.1 Segmentation and Adaptive Assimilation for DetailPreserving


The rst segmentation-based TMO was introduced by Yee and Pattanaik
[238]. Their operator divides the HDR image into regions and calculates
an adaptation luminance for each region. This adaptation luminance can
be used as input to a global operator.
The rst step of the segmentation is to divide the image into regions,
called categories, using a histogram in the logarithmic domain. Contiguous
pixels of the same category are grouped together using a ood-ll approach.
Finally, small groups are assimilated into a bigger one, obtaining a layer.
Small and big groups are dened by two thresholds, trbig and trsmall . The
segmentation is performed again for nmax layers, where the dierence is in
the size of the bin of the histogram, which is computed as
n
,
(3.39)
Binsize (n) = Binmin + (Binmax Binmin )
nmax 1
where Binsize (n) is the bin size for the nth layer, and Binmax and Binmin
are, respectively, the maximum and minimum bin size. Once all layers are
computed, the adaptation luminance is calculated as
'
(
nmax
1 
Ci (x) ,
(3.40)
La (x) = exp
nmax i=1
where Ci (x) is the average log luminance of the group to which a pixel at
coordinate x belongs. The application of La to any global TMO helps to

(a)

(b)

Figure 3.28. An example of Tumblin Rushmeiers TMO [204] using adaptive


luminance calculated by the segmentation method by Yee and Pattanaik [238]
and applied to the Bottles HDR image. (a) The adaptation luminance calculated
using 64 layers. (b) The tone mapped image using the adaptation luminance
in (a).

88

3. Tone Mapping

preserve edges and thus avoid halos (see Figure 3.28). Note that to avoid
banding artifacts a high number of layers (more than 16) is needed.
Listing 3.27 provides the Matlab code for the Yee and Pattanaik [238]
TMO. The full code may be found in the le YeeTMO.m. The method takes
as input parameters nLayer (nmax ), the number of layers used during the
segmentation step, and two parameters characteristic of the Tumblin and
Rushmeier TMO [202], which have been chosen to compress the dynamic
range of the computed local luminance adaptation in the previous step.
The Matlab code for the Tumblin and Rushmeier operator is not shown
in this section, since it has been already presented in Section 3.2.2. The
rst step is to convert the input luminance into logarithmic scale in base 10
(Llog). The bilateral ltering is applied on the luminance to eliminate noise
that may create problems during the segmentation phase. Afterwards, the
luminance minimum value is extracted and stored in the variable minLLog.
At this point, the Llog is segmented in categories, which we have called
if (~ exist ( nLayer ) |~ exist ( CMax ) |~ exist ( Lda ) )
nLayer =64;
CMax =100;
Lda =80;
end
% calculat i on of the adaptatio n
Llog = log10 ( L +1 e -6) ;
% Removing noise using the bilateral filter
minLLog = min ( min ( Llog ) ) ;
maxLLog = max ( max ( Llog ) ) ;
Llog = b i l a t e r a l F i l t e r ( Llog ,[] , minLLog , maxLLog ,4 ,0.02) ;
LLoge = log ( L +2.5*1 e -5) ;
bin_size1 =1;
bin_size2 =0.5;
La = zeros ( size ( L ) ) ;
for i =0:( nLayer -1)
bin_size = bin_size1 +( bin_size2 - bin_size1 ) * i /( nLayer -1) ;
segments = round (( Llog - minLLog ) / bin_size ) +1;
% Calculati o n of layers
[ imgLabel ]= CompoCon ( segments ,8) ;
labels = unique ( imgLabel ) ;
for p =1: length ( labels ) ;
% Group adaptatio n
indx = find ( imgLabel == labels ( p ) ) ;
La ( indx ) = La ( indx ) + mean ( mean ( LLoge ( indx ) ) ) ;
end
end
La = exp ( La / nLayer ) ; La ( find ( La <0) ) =0;
% Dynamic Range Reduction
imgOut = T u m b l i n R u s h m e i e r T M O ( img , Lda , CMax , La ) ;

Listing 3.27. Matlab Code: Segmentation and local luminance adaptation


computation and range reduction step of Yee and Pattanaik TMO [238].

3.5. Segmentation Operators

89

segments; see the rst for loop, where Bn stores the bin size and is equivalent to Equation (3.39). The next step involves grouping contiguous pixels
to the same segment and assimilating smaller groups into a bigger one. The
Matlab function CompoCon.m performs this task, and it may be found in
the util folder. Finally, the local luminance adaptation is computed as in
Equation (3.40).

3.5.2 Lightness Perception in Tone Reproduction


Krawczyk et al. [104] proposed an operator based on the anchoring theory
of lightness perception by Gilchrist et al. [73]. This theory states that
the highest luminance value, or anchor, in the visual eld is perceived as
white by the HVS. The perception is aected by relative area. When the
highest luminance covers a small area it appears to be self-luminous. To
apply lightness theory to complex images, Gilchrist et al. [73] proposed to
decompose the image in areas, called frameworks, where the anchoring can
be applied.
The rst step of the operator is to determine the frameworks. The
image histogram is then calculated in the log10 domain. The k-means clustering algorithm is used to determine the centroid, Ci , in the histogram,
merging close centroids by a weight averaging based on pixel count. To
avoid seams or discontinuity, frameworks are generated with a soft segmentation. A probability function is dened for each framework. This
identies if a pixel belongs to the framework based on the centroid, as
shown in Equation (3.41):


(Ci log10 (Lw (x)))2
Pi (x) = exp
,
2 2

(3.41)

where is equal to the maximum distance between two frameworks. Pi (x)


is smoothed using the bilateral lter to remove small local variations; see
Figure 3.29(b) and Figure 3.29(d). The local anchor, i , for a framework,
i, is determined by computing the 95th percentile of luminance in the
framework. Finally, the tone mapped image is calculated as
Ld (x) = log10 (Lw (x))

n


i Pi (x).

(3.42)

i=1

An example of the nal tone mapped image can be seen in Figure 3.29.
The operator was validated by comparing it with the photographic tone
reproduction operator [180] and the fast bilateral ltering operator [62] to
test the Gelb eect [73], an illusion related to lightness constancy failure.
The result of the experiment showed that the lightness-based operator can

90

3. Tone Mapping

(a)

(b)

(d)

(c)

(e)

Figure 3.29. An example of TMO by Krawczyk et al. [104]. (a) and (c) Frameworks where anchoring is applied. (b) and (d) The smoothed probability maps for
(a) and (c). (e) The nal tone mapped image is obtained by merging frameworks.
(The original HDR image is courtesy of Ahmet O
guz Aky
uz.)

reproduce this eect. The operator is fast and straightforward to implement, but it needs particular care when applying it to dynamic scenes to
avoid eects such as ghosting.
Listing 3.28, Listing 3.29, Listing 3.30, and Listing 3.31 provide the
Matlab code of the Krawczyk et al. [104] TMO. The full code may be
found in the le KrawczykTMO.m.
% Calculate the histrogra m of the HDR image in Log10 space
[ histo , bound , haverage ]= H i s t o g r a m H D R ( img ,256 , log10 ,0) ;

3.5. Segmentation Operators

91

% Determine how many K clusters ( number of zones )


C = bound (1) :1: bound (2) ;
K = length ( C ) ;
if ( C ( K ) < bound (2) )
C =[ C , bound (2) ];
K = length ( C ) ;
end
% Calculat i on of luminance and log luminance
delta =1 e -6;
L = lum ( img ) ;
LLog10 = log10 ( L + delta ) ;
% Init K - means
totPixels = zeros ( size ( C ) ) ;
oldK = K ;
oldC = C ;
iter =100; % mximum number of iterations
histoValu e =( bound (2) - bound (1) ) *(0:( length ( histo ) -1) ) /( length (
histo ) -1) + bound (1) ;
histoValu e = histoValue ;

Listing 3.28. Matlab Code: Initialization of the k-means computation step of


Krawczyk et al. [104] TMO.

Listing 3.28 shows the rst steps where the histogram of the luminance
channel, histo, is computed using the Matlab function HistogramHDR.m,
which is located in the folder util. After histogram computation, initial
frameworks are computed by calculating their centroids, C. They are set
at one order of magnitude distance from the minimum, bound(0), and
maximum, bound(1), bounds of histo.
%K - means loop
for p =1: iter
belongC = - ones ( size ( histo ) ) ;
distance =100* oldK * ones ( size ( histo ) ) ;
% Calculate the distance of each bin in the histogram from
centroids C
for i =1: K
tmpDistan c e = abs ( C ( i ) - histoValu e ) ;
tmpDistan c e = min ( tmpDistance , distance ) ;
indx = find ( tmpDistance < distance ) ;
if (~ isempty ( indx ) )
belongC ( indx ) = i ;
distance = tmpDistanc e ;
end
end
% Calculate the new centroids C
C = zeros ( size ( C ) ) ;
totPixels = zeros ( size ( C ) ) ;
full = zeros ( size ( C ) ) ;

92

3. Tone Mapping

for i =1: K
indx = find ( belongC == i ) ;
if (~ isempty ( indx ) )
full ( i ) =1;
totHisto = sum ( histo ( indx ) ) ;
totPixels ( i ) = totHisto ;
C ( i ) = sum (( histoValue ( indx ) .* histo ( indx ) ) / totHisto ) ;
end
end
% Remove empty framework s
C = C ( find ( full ==1) ) ;
totPixels = totPixels ( find ( full ==1) ) ;
K = length ( C ) ;
% is a fix point reached ?
if ( K == oldK )
if ( sum ( oldC - C ) <=0)
break
end
end
oldC = C ;
oldK = K ;
end

Listing 3.29. Matlab Code: k-means computation step of Krawczyk et al. [104]
TMO.

After initialization in Listing 3.28, the k-means algorithm is applied to


C until a xed point is reached. This process is shown in Listing 3.29. This
step moves centroids and removes empty frameworks.
% Merging framework s
iter = K *10;
for p =1: iter
for i =1:( K -1)
% Distance between framework s has to be <= than 1
if ( abs ( C ( i ) -C ( i +1) ) <1)
tmp = totPixels ( i ) + totPixels ( i +1) ;
C ( i ) =( C ( i ) * totPixels ( i ) + C ( i +1) * totPixels ( i +1) ) / tmp ;
totPixels ( i ) = tmp ;
% Removing not necessary framework s
C ( i +1) =[];
totPixels ( i +1) =[];
K = length ( C ) ;
break
end
end
end
% Calculat i ng the minimum distance

3.5. Segmentation Operators

93

% ( sigma ) between framework s


sigma =100* K ;
for i =1: K
for j =( i +1) : K
sigma = min ( sigma , abs ( C ( i ) -C ( j ) ) ) ;
end
end
% P a r t i t i o n i n g in framework s
framework = - ones ( size ( L ) ) ;
distance =100* K * ones ( size ( L ) ) ;
for i =1: K
tmpDistan ce = abs ( C ( i ) - LLog10 ) ;
tmpDistan ce = min ( distance , tmpDistan c e ) ;
indx = find ( tmpDistance < distance ) ;
if (~ isempty ( indx ) )
% assign the right framework
framework ( indx ) = i ;
% updating distance
distance = tmpDistan c e ;
end
end

Listing 3.30. Matlab Code: Merging frameworks step of Krawczyk et al. [104]
TMO.

At this point, frameworks are merged if their distance is less than one order of magnitude. This straightforward operation is shown in Listing 3.30.
% Probabil i ty maps sum
sigma2 =2* sigma ^2;
tot = zeros ( size ( L ) ) ;
A = zeros (K ,1) ;
s i g m a A r t i c u l a t i o n 2 =2*0.33^2;
for i =1: K
% Articulat i n of the framework
indx = find ( framework == i ) ;
maxY = max ( LLog10 ( indx ) ) ;
minY = min ( LLog10 ( indx ) ) ;
A ( i ) =1 - exp ( -( maxY - minY ) ^2/ s i g m a A r t i c u l a t i o n 2 ) ;
% The sum of Probabil i ty Maps for n o r m a l i s a t i o n
tot = tot + exp ( -( C ( i ) - LLog10 ) .^2/ sigma2 ) * A ( i ) ;
end
% Calculat i ng probabil i ty maps
Y = LLog10 ;
for i =1: K
indx = find ( framework == i ) ;
if (~ isempty ( indx ) )
% Probabil it y map
P = exp ( -( C ( i ) - LLog10 ) .^2/ sigma2 ) ;
P = R e m o v e S p e c i a l s ( P ./ tot ) ;

94

3. Tone Mapping

% Anchoring
W = MaxQuart ( LLog10 ( indx ) ,0.95) ;
Y =Y - W * A ( i ) * P ;
end
end
% Clamp in the range [ -2 ,0]
Ld = ClampImg ( Y , -2 ,0) ;
% Remap values in [0 ,1]
Ld =(10.^( Ld +2) ) /100;

Listing 3.31. Matlab Code: Probability maps calculation step of Krawczyk et


al. [104] TMO.

Once frameworks are merged, the probability maps, P, are computed by


applying Equation (3.41) to each framework. These are multiplied by the
anchoring factor, A, and their normalization weight W. Then Equation (3.42)
is applied by subtracting the scaled P to Y, where logarithmic luminance is
stored. Finally, the range of Y is clamped and shifted in the range [2, 0],
and the tone mapped luminance, Ld, is computed by exponentiation.

3.5.3 Interactive Local Manipulation of Tonal Values


A user-based system for modifying tonal values in a HDR/LDR image was
presented by Lischinski et al. [123]. The system is based on a brush interface inspired by Levin et al. [117] and Agarwala et al. [8]. The user
species, by means of brushes, which regions in the image require an exposure adjustment. There are four possible brushes available:
Basic brush. Sets constraints to pixels covered by the brush, assigning
a weight of w = 1.
Luminance brush. Applies a constraint to pixels whose luminance is
similar to those covered by the brush. If is the mean luminance
for pixels under the brush and is a brush parameter,
a pixel
with

luminance L is assigned a weight of w(L) = exp (L )2 2 .
Luma-chrome brush. A brush similar to the Luminance one, but it
takes into account luminance and chromaticity.
Overexposure brush. Selects all overexposed pixels that are surrounded by the drawn stroke.
Once the stroke is drawn using the various brushes, the system tries
to nd a spatially varying exposure function f . This has to satisfy the

3.5. Segmentation Operators

(a)

95

(b)

(c)

Figure 3.30. An example of the user-based system by Lischinski et al. [123]


applied to the Bottles2 HDR image. (a) The user decided to reduce the exposure
in the overexposed bottle, so a stroke is drawn in that area. (b) At this point
the linear system solver generates a smooth exposure eld, keeping strong edges.
(c) The nal result of the adjustment of the stroke in (a). (Images are courtesy
of Daniel Lischinski [123].)

constraints specied by the user and to take edges into account. In terms
of minimization, f can be dened as




2
f = argminf
w(x) f (x) g(x) +
h(f, L ) ,
x

where L = log(L) and L is the luminance of an LDR or HDR image. The


function w(x) [0, 1] denes constrained pixels, and g(x) is a function that
denes the target exposure. The minimization presents a smoothing term
that takes into account large gradients and is dened as
h(f, L ) =

|fx |2

|Lx | +

|fy |2

|Ly | +

(3.43)

The variable denes the sensitivity of the term to the derivatives of the
log-luminance image, and  is a small nonzero value to avoid singularities.
The minimum f is calculated by solving a linear system [175]. Following
this, the image is adjusted by applying f (see Figure 3.30).
Finally, the system presents an automatic TMO for preview, which
follows the zone system by Adams [3]. The image is segmented in n zones,
using Equation (3.44), and the correct exposure is calculated for each zone:
n = log2 (Lmax ) log2 (Lmin + 106 ) .

(3.44)

These exposures and segments are used as an input for the linear system
solver, generating the tone mapped image. An example is shown in Figure 3.31.
The system presents a user-friendly GUI and a complete tool for photographers, artists, and nal users to intuitively tone map HDR images or

96

3. Tone Mapping

(a)

(b)

(c)

Figure 3.31. An example of the automatic operator by Lischinski et al. [123]


applied to the Bottles2 HDR image. (a) The exposure map for each zone. (b) The
smoothed exposure map; note that sharp edges are kept. (c) The nal tone
mapped image.

enhance LDR images. Moreover, it can be applied to other tasks such as


modications of the depth of eld and spatially varying white balancing.
Listing 3.32 provides the Matlab code of the Lischinski et al. [123]
TMO. The full code may be found in the le LischinskiTMO.m.
% Is alpha defined ?
if (~ exist ( LSC_alpha ) ) LSC_alpha =0.5; end
% Number of zones in the image
maxL = max ( max ( L ) ) ;
minL = min ( min ( L ) ) ;
epsilon = 1e -6;
minLLog = log2 ( minL + epsilon ) ;
Z = ceil ( log2 ( maxL ) - minLLog ) ;
% Chose the r e p r e s e n t a t i v e Rz for each zone
fstopMap = zeros ( size ( L ) ) ;
Lav = logMean ( L ) ;
for i =1: Z
indx = find (L >=2^( i + minLLog ) &L <2^( minLLog + i +1) ) ;
if (~ isempty ( indx ) )
Rz = Median ( L ( indx ) ) ;
% p h o t o g r a p h i c operator
Rz = ( LSC_alpha * Rz ) / Lav ;
f = Rz /( Rz +1) ;
fstopMap ( indx ) = log2 ( f / Rz ) ;
end
end
% M i n i m i z a t i o n process
fstopMap = L i s c h i n s k i M i n i m i z a t i o n ( log2 ( L + epsilon ) , fstopMap ,
0.07* ones ( size ( L ) ) ) ;
imgOut = zeros ( size ( img ) ) ;
for i =1:3

3.5. Segmentation Operators

97

imgOut (: ,: , i ) = img (: ,: , i ) .*2.^ fstopMap ;


end

Listing 3.32. Matlab Code: The TMO proposed by Lischinski et al. [123].

The method takes the parameter LSC alpha as input. This is the starting exposure value for the tone mapping step. The algorithm starts by
extracting the minimum and maximum luminance values of the input HDR
image. The number of zones of the image is computed using Equation (3.44)
and stored in Z. The next step is to nd a spatial varying exposure function
f that is represented by the variable fstopMap. A representative luminance
value R z (Rz ) is computed for each zone as median luminance of the pixels
in the zone. The global operator of Reinhard et al. [180] is used to map Rz
to a target value f (Rz ) and is stored in f. Finally, the target exposure is
stored in fstopMap.
Equation (3.43) is afterwards minimized using the Linschinski
Minimization.m function that may be found in the Tmo/util folder. This
function is shown in Listing 3.33 and is a typical solution of a sparse linear
system A b = x.
function result = L i s c h i n s k i M i n i m i z a t i o n (L , g , W )
% Parameter s I n i t i a l i z a t i o n
lambda = 0.2;
alpha = 1;
e = 0.0001;
[r , c ] = size ( L ) ;
n = r*c;
% Generatio n of the b vector
g = g .* W ;
b = reshape (g , r *c ,1) ;
% Generatio n of the A matrix
% Gradients c o m p u t a t i o n s
dy = diff (L , 1 , 1) ;
dy = - lambda ./( abs ( dy ) .^ alpha + e ) ;
dy = padarray ( dy , [1 0] , post ) ;
dy = dy (:) ;
dx
dx
dx
dx

=
=
=
=

diff (L , 1 , 2) ;
- lambda ./( abs ( dx ) .^ alpha + e ) ;
padarray ( dx , [0 1] , post ) ;
dx (:) ;

% Building A
A = spdiags ([ dx , dy ] ,[ - r , -1] , n , n ) ;
A = A + A ; % symmetric condition s
g00 = padarray ( dx , r , pre ) ; g00 = g00 (1: end - r ) ;

98

3. Tone Mapping

g01 = padarray ( dy , 1 , pre ) ; g01 = g01 (1: end -1) ;


D = reshape ( W , r *c ,1) -( g00 + dx + g01 + dy ) ;
A = A + spdiags (D , 0 , n , n ) ;
% Solving the system
result = A \ b ;
% Reshaping the result
result = reshape ( result , r , c ) ;
end
end

Listing 3.33. Matlab Code: Minimization process routine.

3.5.4 Exposure Fusion


HDR images usually need to be assembled from a series of LDR ones before
tone mapping. A novel approach that can avoid the tone mapping step was
proposed by Mertens et al. [143]. The central concept of this operator is
to merge well-exposed pixels from each exposure.
The rst step is the analysis of each LDR image to determine which pixels can be used during merging. This is achieved by calculating three metrics for each pixel: contrast, saturation, and well-exposedness. Contrast,
C, refers to the absolute value of the gradients in the image. Saturation,
S, is dened as the standard deviation of the red, green, and blue channels.
Well-exposedness, E, of luminance L determines if a pixel is well-exposed
in a fuzzy way as
E(L) = exp(0.5(L 0.5)2 ()2 ).

(3.45)

These three metrics are combined together, obtaining a weight Wi (x)


that determines the importance of the pixel for that exposure (Equation (3.46)):
(3.46)
Wi (x) = Ci (x)C Si (x)S Ei (x)E ,
where i refers to the ith image, and C , S , and E are exponents that
increase the inuence of a metric over the others. The N weight maps are
normalized such that their sum is equal to one at each pixel position in
order to obtain consistent results.
After analysis, the exposures are combined in the nal image. To avoid
seams and discontinuities, the images are blended using Laplacian pyramids [29]. While the weights are decomposed in Gaussian pyramids as
denoted with the operator G, exposure images Ik are decomposed into
Laplacian pyramids as denoted with the operator L. Thus, the blending is

3.5. Segmentation Operators

99

(a)

(b)

(c)

(d)

(e)

Figure 3.32. An example of the fusion operator by Mertens et al. [143] applied
to the Tree HDR image. (a) The rst exposure of the HDR image. (b) The
weight map for (a); note that pixels from the tree and the ground have high
weights because they are well exposed. (c) The second exposure of the HDR
image. (d) The weight map for (c); note that pixels from the sky have high
weights because they are well exposed. (e) The fused/tone mapped image using
Laplacian pyramids.

calculated as
Ll {Id }(x) =

n


Ll {Ii }(x)Gl {Wi }(x),

i=1

where l is the lth level of the Laplacian/Gaussian pyramid. Finally, the


L{Id } is collapsed, obtaining the nal tone mapped image Id (see Figure 3.32).

100

3. Tone Mapping

% default parameter s if they are missing


if (~ exist ( wE ) )
wE =0.75;
end
if (~ exist ( wS ) )
wS =0.5;
end
if (~ exist ( wC ) )
wC =0.5;
end
% stack generati on
stack =[];
[r , c ]= size ( img ) ;
if (( r * c ) >0)
% Convert the HDR image into a stack
stack = G e n e r a t e E x p o s u r e B r a c k e t i n g ( img ,2) ;
else
% load images from the current directory
images = dir ([ *. , format ]) ;
[n , m ]= size ( images ) ;
for i =1: n
stack (: ,: ,: , i ) = double ( imread ( images ( i ) . name ) ) /255;
end
end
% number of images in the stack
[r ,c , col , n ]= size ( stack ) ;

Listing 3.34. Matlab Code: Bracketing step of Mertens et al. TMO [143].

The main advantage of this operator is that a user does not need to generate
HDR images; also, it minimizes color shifts that can occur in traditional
TMOs. This is because well-exposed pixels are taken in the blend without
applying a real compression function, just a linear scale.
Listing 3.34, Listing 3.35, and Listing 3.36 provide the Matlab code
of the Mertens et al. [143] TMO. The full code may be found in the le
MertensTMO.m.
The method takes the exponents wE, wS, wC, (C , S , and E ) and the
format of a series of LDR images as input. In our Matlab implementation,
we also included the possibility of having an HDR image as input (img),
which is converted into a series of LDR images with dierent exposures (see
the Matlab function GenerateExposureBracketing.m in the Tmo/util
folder). When an HDR image is not found as input, the program loads all
LDR images with le extension format from the local folder and organizes
them in a stack.

3.5. Segmentation Operators

101

% Computat i on of weights for each image


total = zeros (r , c ) ;
weight = zeros (r ,c , n ) ;
for i =1: n
% calculati o n of the weights
L = lum ( stack (: ,: ,: , i ) ) ;
weightE = M e r t e n s W e l l E x p o s e d n e s s ( stack (: ,: ,: , i ) ) ;
weightS = M e r t e n s S a t u r a t i o n ( stack (: ,: ,: , i ) ) ;
weightC = M e r t e n s C o n t r a s t ( L ) ;
% final weight
weight (: ,: , i ) =( weightE .^ wE ) .*( weightC .^ wC ) .*( weightS .^ wS ) ;
total = total + weight (: ,: , i ) ;
end
% N o r m a l i z a t i o n of weights
for i =1: n
weight (: ,: , i ) = R e m o v e S p e c i a l s ( weight (: ,: , i ) ./ total ) ;
end

Listing 3.35. Matlab Code: Weighting step of Mertens et al. TMO [143].

% empty pyramid
tf =[];
for i =1: n
% Laplacian pyramid : image
pyrImg = pyrImg3 ( stack (: ,: ,: , i ) , @pyrLapGen ) ;
% Gaussian pyramid : weight
pyrW = pyrGaussGe n ( weight (: ,: , i ) ) ;
% M u l t i p l i c a t i o n image times weights
tmpVal = pyrLstS2OP ( pyrImg , pyrW , @pyrMul ) ;
if ( i ==1)
tf = tmpVal ;
else
% accumulation
tf = pyrLst2OP ( tf , tmpVal , @pyrAdd ) ;
end
end
% Evaluatio n of Laplacian / Gaussian Pyramids
imgOut = zeros (r ,c , col ) ;
for i =1:3
imgOut (: ,: , i ) = pyrVal ( tf ( i ) ) ;
end

Listing 3.36. Matlab Code: Pyramid step of Mertens et al. TMO [143].

Then, the metrics are computed for each element of the stack (Listing 3.35). First, the luminance is extracted from the stack and stored in
the variable L. Then the three metrics for well-exposedness, saturation, and

102

3. Tone Mapping

function We = M e r t e n s W e l l E x p o s e d n e s s ( img )
% sigma for the Well - exposedne s s weights .
sigma =0.2; % as in the original paper
sigma2 =2* sigma ^2;
We = exp ( -( img (: ,: ,1) -0.5) .^2/ sigma2 ) ;
for i =2:3
We = We .* exp ( -( img (: ,: , i ) -0.5) .^2/ sigma2 ) ;
end
end

Listing 3.37. Matlab Code: Well-exposedness metric Mertens et al. [143].

contrast are computed using the Matlab functions MertensWellExposed


ness.m, MertensSaturation.m, and MertensContrast.m, respectively.
They may be found in the Tmo/util folder.
The well-exposedness Matlab function, Listing 3.37, applies Equation (3.45) for the three color channels and the final output of the metric is
the product of the output of the three channels (We). The value of sigma
() is fixed to 0.2 as used in the original paper.
The saturation metric is computed as the distance from the average
between the three color channels (see Listing 3.38). The contrast is measured as the output of a Laplacian filter on the luminance value L (see
Listing 3.39).
Once all metrics are computed, they are used to obtain the final weight
maps, weight, that will be used by the blending step. Before the blending
step, the normalization of the weight maps is needed. The total variable
function Ws = M e r t e n s S a t u r a t i o n ( img )
mu = ( img (: ,: ,1) + img (: ,: ,2) + img (: ,: ,3) ) /3;
sumC = ( img (: ,: ,1) - mu ) .^2+...
( img (: ,: ,2) - mu ) .^2+...
( img (: ,: ,3) - mu ) .^2;
Ws = sqrt ( sumC /3 ) ;
end

Listing 3.38. Matlab Code: Saturation metric Mertens et al. [143].

function Wc = M e r t e n s C o n t r a s t ( L )
Wc = abs ( L a p l a c i a n F i l t e r ( L ) ) ;
end

Listing 3.39. Matlab Code: Contrast metric Mertens et al. [143].

3.6. New Trends to the Tone Mapping Problem

103

is used to store the sum of the N weight maps (N is the number of LDR
images) to be used in the last loop of Listing 3.35 for the normalization
step.
The next step is to merge pixels of LDR images using a Laplacian decomposition of the images and a Gaussian pyramid of the weight maps. To
perform this step, the Matlab function for the Laplacian decomposition
and Gaussian pyramid are used for each LDR image (pyrImg) and weight
map (pyrW), respectively. In tmpVal, the resulting Laplacian pyramid is
stored as multiplication between the variables pyrImg and the pyrW (for one
of the N images). tf stores the sum of the N resulting Laplacian pyramids.
Finally, the pyramid is collapsed into a single image using the Matlab
function pyrVal.m, which may be found in the LaplacianPyramid folder.

3.6

New Trends to the Tone Mapping Problem

In what we have seen so far in the state-of-the-art TMOs presented in the


previous sections, very little attention has been given to color information,
and the output device has always been considered to be a standard LDR
monitor.
For instance, the compression of the HDR input to the low dynamic
range of the display has been performed only on the luminance range; only
the color ratio has been kept to be adapted, in the end, by the compressed
luminance range. In the last few years some research that attempts to
maintain color information has appeared. Such techniques describe how
the color appearance of the input image may be reproduced [11, 106] and
how the color can be corrected to be mapped into the lower dynamic range
of a display device, reducing the color distortion with respect to the original
colors of the HDR input image [138]. Techniques have also been presented
that try to develop a generic tone mapping curve that is able to t and
simulate the existing TMOs [132]; additionally, these techniques adapt the
tone mapping output to be suitable for any display device [137].

3.6.1 Color Appearance


The dierences among viewing environments in which an image is displayed
plays an important role on how the image is perceived by the observer. As
an example, lets dene the environment A as the one where the image
is displayed and the environment B where the image has been acquired.
A and B may dier considerably in their lighting conditions, i.e., dierent illumination under which the same image is observed in their viewing
conditions (dierent background), etc.

104

3. Tone Mapping

Figure 3.33. Scheme proposed by Aky


uz and Reinhard [11] on how to integrate
the use of CAM in the context of the tone mapping problem.

A color appearance model (CAM) describes how the HVS is adapting


to the dierent environments and how it will perceive the images in these
dierent environments. The rst step of a CAM model is to account for
the chromatic adaptation, which is a mechanism that enables the HVS
to adapt to the dominant colors of illumination. Afterwards, the color
appearance attributes that are correlated to perceptual attributes of color
such as chroma, hue, etc., are computed. Once the appearance attributes
are computed, the CAM model can be reversed and by taking the new
environment conditions as input, the colors in this new environment can
be calculated.
Aky
uz and Reinhard [11] have proposed a scheme to integrate the use
of a CAM in the context of tone mapping. Figure 3.33 demonstrates this
method, where a CAM is rst applied to the original HDR input image
to predict the color appearance attributes. Then the color appearance
attributes, together with the new viewing conditions of the environment
where the HDR input image will be displayed, are given as input to the
reversed CAM that predicts the color stimulus of the HDR input image for
the new viewing environment. Before applying the dynamic range compression step via tone mapping, the output of the reversed CAM is reset
with the original luminance of the HDR input image while the chromatic
information is retained. This is done to separate the range compression
from the color appearance step; otherwise, uncontrollable results may be
generated [11].
Kuang et al. [106] proposed a CAM model, based on the iCAM02 framework [63], for tone mapping. Figure 3.34 shows the pipeline of iCAM06.

3.6. New Trends to the Tone Mapping Problem

105

Figure 3.34. The iCAM06 pipeline proposed by Kuang et al. [106]. (The original
HDR image is courtesy of Mark Fairchild [65].)

The method works in the XYZ color space as input images for working in
an independent device color space. Firstly, the model separates high (detail layer) and low (base layer) frequencies using the bilateral lter. Secondly, the base layer is tone mapped and applies the nonlinear responses
of cones (a sigmoid function) after chrominance adaptation. This is computed using a Gaussian-ltered version of the base layer. Thirdly, the detail
layer is processed and applies a power function to simulate Stevens eect,
which predicts the increase of the local contrast when luminance levels are

(a)

(b)

Figure 3.35. A comparison between the iCAM06 by Kuang et al. [106] and iCAM
2002 by Fairchild and Johnson [63] applied to Niagara Falls HDR image. (a) The
image processed with iCAM02; note that there is a purple color shift. (b) The
image processed with iCAM06; note that ne details are better preserved than
in (a) due to the bilateral decomposition. (The original HDR image is courtesy
of Mark Fairchild [65].)

106

3. Tone Mapping

increased. After the recombination of the base and detail layers, the image is converted into the IPT color space [88] in order to predict the Hunt
eect; an increase in luminance level results in an increase in perceived
colorfulness. Finally, an inverse model of characterization of the displaying
device is applied to the image (see Figure 3.35).

3.6.2 Color Correction


We have dedicated a large part of this chapter to luminance compression via
tone mapping. Tone mapping adjusts the contrast relationship in the input
HDR image, allowing preservation of details and contrast in the output
LDR image. This can often cause a change in color appearance and, to
solve this problem, TMOs frequently employ an ad hoc color desaturation
step. This does not guarantee that the color appearance is fully preserved,
and furthermore a manual adjustment for each tone mapped image may
be required [138]. Mantiuk et al. [138] performed a series of experiments
to quantify and model the correction in color saturation required after
tone mapping. Given a tone mapping curve in the luminance domain, the
authors wanted to nd the chrominance values for the output image that
match the input HDR image appearance (with no tone modication). In
other words, the main aim of Mantiuk et al. was not to compensate for
dierences in viewing conditions between the display and the real-world
scenes, as happens in the color appearance models, but to preserve the
appearance of the reference image without tone modication when shown
on the same display. The best exposure of an HDR image, after taking
into account the display model of the device and where the image will be
displayed, is taken as the reference image.
The results of these experiments allow the relationship between the contrast factor c and the saturation factor s to be quantied and approximated
by a power function, as shown in Equation (3.47):
s(c) =

(1 + k1 )ck2
.
1 + k1 ck2

(3.47)

Here k1 and k2 are parameters that were tted using least square tting
for two dierent goals: nonlinear color correction and luminance preserving correction. In the case of nonlinear color correction, the parameters are
1.6774 and 0.9925, respectively. In the case of luminance preserving correction, the parameters are 2.3892 and 0.8552, respectively. The formula
can be easily integrated in existing TMOs, where the preservation of the
color ratio takes into account the saturation factor s as
s

Cw (x)
Cd (x) =
Ld (x),
(3.48)
Lw (x)

3.6. New Trends to the Tone Mapping Problem

107

(a)

(b)

(c)

(d)

Figure 3.36. An example of the color correction technique by Mantiuk et al. [138].
A tone mapped version of (a) the reference HDR image (scaled and clamped for
visualization purposes) with (b) a simple color correction method S = 0.3, (c) a
simple color correction method S = 1, and (d) the color correction technique by
Mantiuk et al. [138]. (Images are courtesy of Rafal Mantiuk.)

where C denotes a color channel. As noted by the authors, the main


drawback of Equation (3.48) is that it alters the resulting luminance for
s = 1 and for colors dierent from gray. The authors suggested a dierent
formula that preserves luminance and involves only linear interpolation
between chromatic and achromatic colors (Equation (3.49)):

Cd (x) =



Cw (x)
1 s + 1 Lw (x).
Ld (x)

(3.49)

Figure 3.36(d) shows a result of the color correction techniques proposed


by Maniuk et al. [138] compared with the traditional color correction used
in the tone mapping problem, making use of a dierent saturation value S
0.3 (c) and 1.0 (b), respectively.

108

3. Tone Mapping

3.6.3 Modeling a Generic Tone Mapping Operator


As we have seen in this chapter, several TMOs have been proposed and in
order to validate and/or analyze their quality, several validation methodologies have been published. We will discuss some of these in Chapter 6.
The overall validity of the outcome of these methodologies may be considered disputable due to the fact that the experiments have been done under
dierent preconditions, the implementation of the state-of-the-art TMOs
may dier slightly, the number of users used in the experiments may not
be sucient to derive the nal conclusions, etc. Mantiuk and Seidel [132]
proposed a dierent approach. Instead of proposing a validation technique,
they modeled the processing that is performed inside a TMO, proposing a
generic TMO that makes use of three steps such as a per-pixel tone curve,
modulation transfer function, and color saturation (see Figure 3.37). They
demonstrated that this solution may satisfactorily approximate the output
of many global and local TMOs.
The mathematical formula that extrapolates the data ow of the generic
TMO of Figure 3.37 for a single color component (red, green, or blue) is
Cd = M T F (f (Lw))Rs .
The function M T F is the modulation transfer function, f is a global tone
curve applied to each pixel separately, s is the color saturation, and R is
the color ratio, as shown in Equation (3.50):
R=

Cw
.
Lw

(3.50)

The variable Cw is a color component of the input HDR image. Equation (3.50) is the typical approach used in the tone mapping problem for
removing distortion to colors due to range reduction.
Mantiuk and Seidel adopted as a Tone Curve T C a four-segment sigmoid function (Equation (3.51)):

0
if L b dl ,

1
1 c L b
if b dl < L b,
 b) + 2
(3.51)
f (Lw ) = 21 1aLl (L

b
1

if b < L b + dh ,

2 c 1ah (L b) + 2

1
if L > b + d ,
h

Figure 3.37. Data ow of the Generic TMO proposed by Mantiuk et al. [132].

3.6. New Trends to the Tone Mapping Problem

109

(a)

(b)

Figure 3.38. An example of the generic TMO by Mantiuk and Seidel [132].
(a) A tone mapped HDR image using the fast bilateral ltering by Durand and
Dorsey [62]. (b) The same tone mapped scene using the generic TMO that
emulates the fast bilateral ltering TMO. (Images are courtesy of Rafal Mantiuk.)

where L = log10 Lw , b is the image brightness adjustment parameter, c is


the contrast parameter, and al , ah are parameters that are responsible for
the contrast compression for shadows and highlights and are derived as
al =

cdl 1
dl

ah =

cdh 1
,
dh

where dl and dh are the lower and higher mid-tone ranges, respectively.
Since the f and the color saturation is sucient to simulate most of the
global TMOs, the model simulates spatially varying aspect of the local

(a)

(b)

Figure 3.39. An example where the generic TMO by Mantiuk and Seidel [132]
fails to model a TMO. (a) A tone mapped HDR image using the gradient domain
operator by Fattal and Lischinski [67]. (b) The same tone mapped scene using
the generic TMO, which is trying to emulate the gradient domain operator; ne
details in the original picture are not fully enhanced by the generic TMO. (Images
are courtesy of Rafal Mantiuk.)

110

3. Tone Mapping

TMOs through the M T F . This is a one-dimensional function that species which spatial frequency to amplify or compress. The authors adopted
a linear combination of ve parameters and ve basis functions to simulate
their M T F . In order to simulate a TMO, the parameters of the f (including
s) and the M T F need to be estimated using tting procedures. The authors proposed to estimate f s parameters using the Levenberg-Marquardt
method and M T F s parameters by solving a linear least-squares problem.
In proposing this generic TMO, Mantiuk and Seidel show how most of
the state-of-the-art TMOs are adopting similar image processing techniques
where the dierences lie on the strategy to choose the set of parameters
used. In Figure 3.38 and Figure 3.39 some results of the generic TMO
proposed by Mantiuk and Seidel [132] are shown. Figure 3.38 shows how the
method is able to simulate the result of the fast bilateral ltering by Durand
and Dorsey [62]. Figure 3.39 demonstrates how the method is failing to
simulate the gradient domain operator by Fattal and Lischinski [67]. In
this case the details of the window, in the mirror, and of the bulb lamp
close to the window are completely lost.

3.6.4 Display Adaptive Tone Mapping


As already shown, a TMO is used for reproducing HDR images on a display
device with limited dynamic range, typically around 200 : 1, but display
devices can have dierent characteristics such as color gamut, dynamic
range, etc. An extra step is required to manually adjust an image on the
display device on which it would be visualized. This task is often tedious
and it requires expert skills to achieve good quality results. To solve this
problem, Mantiuk et al. [137] have proposed a TMO that adaptively adjusts
content given the characteristics of a particular display device. Figure 3.40
illustrates the pipeline of the proposed display adaptive TMO.

Figure 3.40. The pipeline of the display adaptive TMO proposed by Mantiuk et
al. [137].

3.6. New Trends to the Tone Mapping Problem

(a)

111

(b)

(c)

Figure 3.41. An example of the adaptive TMO by Mantiuk et al. [137]. (a) A
tone mapped image for paper. (b) The same scene in (a) tone mapped for an
LDR monitor with 200 cd/m2 output. (c) The same scene in (a) tone mapped
for an HDR monitor with 3, 000 cd/m2 output. Note that images are scaled to
be shown on paper. (Images are courtesy of Rafal Mantiuk.)

As has been shown by the authors, it is not possible to have a perfect


match between the original HDR image and its rendering on the display,
when the characteristics of the display are not enough to accommodate the
dynamic range and color gamut of the original HDR image. However, a
trade-o between the preservation of certain image features at the cost of
others can be achieved. The choice of the feature to preserve is based on
the application, and this process can be automated thereby designing an
appropriate metric.
Mantiuk et al. tried to develop this kind of metric through the framework proposed in Figure 3.40, where a display-adapted image is generated
in such a way that it will be the best possible rendering of the original
scene. This goal is achieved by comparing the response of the HVS for an
image shown on the display Rdisp and for the original HDR input image
Rori . In order to compute the responses of the HVS, an HVS model is
integrated with a display model for the image displayed. In the case of the
original HDR input image, the HVS model is optionally integrated with
an image enhancement model. The parameters of the TMO are computed
through the minimization of the dierence between Rori and Rdisp . Some
results of the application of the metric presented by Mantiuk et al. [137]
for media with dierent dynamic range are shown in Figure 3.41.

112

3.7

3. Tone Mapping

Summary

In the last 20 years, several approaches have been proposed to solve the
so called tone mapping problem. They have tried to take into account
dierent aspects. These include local contrast reproduction, ne details
preservation without introducing halo artifacts, simulation of the HVS behavior, etc. Despite the large number of techniques, the dynamic range
compression has been mainly tackled on the luminance values of the input
HDR image, without properly taking into account how this was eecting
the color information. Only recently have researchers addressed this problem by proposing the application of color appearance models to the HDR
imaging eld, a more in-depth understanding of the relationship between
contrast and saturation for minimizing the color distortion, etc.
In spite of the large number of TMOs that have been developed, the
tone mapping problem is still an open issue. Even the introduction of HDR
displays does not solve this problem. This is because there are images that
can exceed the range of HDR displays, which are around 200, 000 : 1 with
a luminance output peak of 3, 000 cd/m2 .
This chapter has given a critical overview of the available techniques to
solve the tone mapping problem; it has also presented the new trends that
we believe will be a central part of future research.

4
Expansion Operators for Low
Dynamic Range Content

The expansion of LDR content is an emerging topic in the computer graphics community that links LDR and HDR imaging. This is achieved by
transforming LDR content into HDR content using operators commonly
termed expansion operator (EO) (see Figure 4.1). This allows the large
amounts of legacy LDR content to be enhanced for viewing on HDR displays and to be used in HDR applications such as image-based lighting. An
analogy to expansion is the colorization of legacy black and white content
for color displays [117, 124, 231].
An EO is dened as a general operator g over an LDR image I as
g(I) = Dwhc
Dwhc
,
o
i
where Di Do , w is the width of I, h is the height of I, c is the number of
color channels in I, and Di is the LDR domain. In the case of 8-bit images
Di = [0, 255], and Do is an HDR domain; in the case of single precision
oating point Do = [0, 3.4 1038 ] R.
The application of an EO to an LDR single exposure content involves
the reconstruction of HDR content. This is an ill-posed problem since
information is missing in overexposed and underexposed regions of the
image/frame. Common steps in an EO are the following:
1. Linearization. Creates a linear relationship between real-world radiance values and recorded pixels. This step can be skipped when
visualizing images on HDR displays.
2. Expansion of pixel values. Increases the dynamic range of the image.
Usually, low values are compressed, high values are expanded, and
mid values are kept as in the original image.

113

114

4. Expansion Operators for Low Dynamic Range Content

(a)

(b)

Figure 4.1. The general concept of expansion operators. (a) A single exposure
image. (b) A graph illustrating the luminance scanline at coordinate x = 448
of this single exposure image. The red line shows the clamped luminance values
due to the single exposure capture of the image. The green line shows the full
luminance values when capturing multiple exposures. An expansion operator
tries to recover the green prole starting from a red prole.

3. Over/underexposed reconstruction. Generates the missing content in


the overexposed or underexposed areas of an LDR image.
4. Artifacts reduction. Decreases artifacts due to quantization or image
compression (i.e., JPEG and MPEG artifacts), which can be visible
after the expansion of pixel values.
5. Color correction. Keeps colors as in the original LDR image. During
expansion of pixel values colors are desaturated. This is the opposite
problem of what happens in tone mapping.
EOs have been previously referred in the literature as inverse tone
mapping operators (iTMOs) [19] or reverse tone mapping operators (rTMOs) [182]. However, neither of these two terminologies is strictly correct,
and they do not fully capture the complex operations required to convert
LDR content into HDR content. iTMOs are EOs that invert TMOs. However, many TMOs are not easily inverted because they require the inversion
of a partial dierential equation (PDE), or other noninvertible operations.
Examples of these TMOs are LCIS [203], the fast bilateral lter TMO [62],
gradient domain compression TMO [67], etc. rTMOs, on the other hand,
only refer to those EOs that reverse the order of TMO operations.
As in the case of TMOs, color gamut is changed during expansion,
obtaining desaturated colors. To solve this problem, a straightforward
technique such as color exponentiation (Equation (3.2)) can be applied.

4.1. Linearization of the Signal Using a Single Image

115

Here the exponent s needs to be set in the interval (1, +) (as opposed to
(0, 1]) to increase the saturation.

4.1

Linearization of the Signal Using a Single Image

Linearization of the signal is a very important step in many EOs. The


main reason to work in a linear space is that there is more control and
predictability over the expansion. In an unknown space, it is hard to predict how the expansion will behave, as shown in Figure 4.2. Furthermore,
precise/estimated measurements of scene radiance are needed in order to
recover scene properties such as mean, geometric mean, standard deviation, etc.
When there is access to the capturing device, linearization can be performed by calculating the CRF. This function can be calculated using
10

Expanded Measured Sensor Values

Normalized Measured Sensor Value

0.8

0.6

0.4

0.2

0
0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Normalized Irradiance

0.8

0.9

0
0

0.1

0.2

0.3

0.5

0.6

0.7

0.8

0.9

0.6

0.7

0.8

0.9

(b)
10

Expanded Measured Sensor Values

Normalized Measured Sensor Value

(a)

0.8

0.6

0.4

0.2

0
0

0.4

Normalized Irradiance

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Normalized Irradiance

(c)

0.8

0.9

0
0

0.1

0.2

0.3

0.4

0.5

Normalized Irradiance

(d)

Figure 4.2. An example of the need of working in linear space. (a) Measured
values from a sensor in the linear domain. (b) The expansion of signal in (a)
using f (x) = 10e4x4 . (c) Measured values from a sensor with an unknown CRF
applied to it. (d) The expansion of signal (c) using f . Note that the expanded
signal in (d) has a dierent shape from (b), which was the desired one.

116

4. Expansion Operators for Low Dynamic Range Content

methods presented in Section 2.1.2. This step is not needed when images are stored using a raw data format (RAW), because a RAW format
stores linear values from the CCD sensor. However, in the general case,
the CRF of the camera/video camera is not available, and images/videos
are stored in an 8-bit nonlinear format. In this case, the estimation of the
CRF needs to be carried out from a single image or a couple of frames.

4.1.1 Blind Inverse Gamma Function


Computer-generated images or processed RAW photographs are usually
stored with gamma correction. Farid [66] proposed an algorithm based on
multispectral analysis to blindly linearize the signal when gamma correction
is applied to the image. Farid observed that the application of gamma to
a signal introduces new harmonics. For example, given a simple signal
y(x) = a1 sin(1 x) + a2 sin(2 x),
if y(x) is gamma corrected and is rewritten using a second term Taylor
expansion, y(x)2 , the result is given by
y(x) y(x)2 = a1 sin(1 x) + a2 sin(2 x)
1
1
+ a21 (1 + sin(21 x)) + a22 (1 + sin(22 x))
2
2
+ 2a1 a2 (sin(1 + 2 ) + sin(1 2 )).
As can be seen, new harmonics are introduced after gamma correction
that are correlated with the original ones: 21 , 22 , 1 + 2 , and 1 2 .
High-order correlations introduced by the gamma in a one-dimensional
signal y(x) can be estimated with high-order spectra; for example, a normalized bispectrum correlation estimates third order correlations [142]:
E[Y (1 )Y (2 )Y (1 + 2 )]
,
E[|Y (1 )Y (2 )|2 ]E[|Y (1 + 2 )|2 ]

b2 (1 , 2 ) =

where Y is the Fourier transform of y. The value of b2 (1 , 2 ) can be estimated using overlapping segments of the original signal y (Equation (4.1)):
b(1 , 2 ) = )

| N1

1
N

N 1

N 1
k=0

Yk (1 )Yk (2 )Y k (1 + 2 )|
,
N 1
2
|Yk (1 )Yk (2 )|2 N1
|Y
(
+

)|
k
1
2
k=0
k=0

(4.1)

where Yk is the Fourier transform of the kth segment, and N is the number
of segments. Finally, the gamma of an image is estimated by applying a
range of inverse gamma to the gamma-corrected image and choosing the

4.1. Linearization of the Signal Using a Single Image

117

value that minimizes the measure of the third-order correlations as


=

|b(1 , 2 )|.

1 = 2 =

Farid evaluated this technique with fractal and natural images and
showed that the recovered values had average errors of between 5.3%
and 7.5%.

4.1.2 Radiometric Calibration from a Single Image


A general approach for linearization using a single image was proposed by
Lin et al. [122]. This approach exploits information at edges of the image
for approximating the CRF.
In an edge region there are mainly two colors, I1 and I2 , separated by
the edge. Colors between I1 and I2 are a linear interpolation of these two
and form a line in the RGB color space (see Figure 4.3(b)). When the CRF
is applied to these values, M = f (I), since it is a general nonlinear function
the line of colors is transformed into a curve (see Figure 4.3(c)). In order to
linearize the signal the inverse CRF, g = f 1 , has to map measured values
M (x) to a line dened by g(M1 ) and g(M2 ). This is solved by nding

Figure 4.3. A colored edge region. (a) The real scene radiance value and shape at
the edge with two radiance values R1 and R2 . (b) The irradiance pixels, which
are recorded by the camera/video camera with interpolated values between I1
and I2 . Note that colors are linearly mapped in the RGB color space. (c) The
measured pixels after the CRF application. In this case, colors are mapped on a
curve in the RGB color space.

118

4. Expansion Operators for Low Dynamic Range Content

Figure 4.4. A grayscale edge region. (a) The real scene radiance values and
shape at the edge with two radiance values R1 and R2 . (b) The irradiance pixels,
which are recorded by the camera/video camera. Irradiance values are uniformly
distributed. This results in a uniform histogram. (c) The measured pixels after
the CRF application. This transforms the histogram from a uniform one into a
nonuniform histogram.

a g where for each pixel M (x) in the region, minimizes distance from
g(M (x)) to the line g(M1 )g(M2 ):
D(g, ) =

 |(g(M1 ) g(M2 )) (g(M (x)) g(M2 ))|


.
|g(M1 ) g(M2 )|

Edge regions, which are suitable for the algorithm, are chosen from
nonoverlapping 15 15 windows with two dierent uniform colors along
the edges, which are detected using a Canny lter [74]. The uniformity of
two colors in a region is calculated using variance. To improve the quality
and complete the missing parts of g, a Bayesian learning step is added
using inverse CRFs from real cameras [77].
Lin and Zhang [121] extended Lin et al.s method [122] to grayscale images. The variable g is now estimated using a function that maps nonuniform histograms of M into uniform ones (see Figure 4.4). They propose
a measure for determining the uniformity of a histogram H of an image,
which is dened as
N (H) =

$2
$2
Imax #
3 #
|H(k)|
1 

|H|
|H(k)|
|Hn |
+
,
b
b
3 n=1
3
k=Imin

where |H| is the number of pixels in H, |H(k)| is the number of pixels of


intensity k, b is the number of grayscale levels, is an empirical parameter,
Imin and Imax are the minimum and maximum intensity in H, and Hn is

4.2. Decontouring Models for High Contrast Displays

119

a cumulative function dened as


Imin + nb
3 1

Hn =

|H(i)|.

i=Imin + (n1)b
3

The inverse CRF is calculated similarly to Lin et al.s method [122],


where g is a function that minimizes the histogram uniformity for each
edge region, as shown in Equation (4.2):

D(g, ) =
H N (g(H)).
(4.2)
H

Here H = |H|
b is a weight for giving more importance to a dense histogram.
Regions are chosen as in Lin et al. [122], and g is rened by Bayesian
learning. The method can be applied to colored images by applying it to
each color channel.

4.2

Decontouring Models for High Contrast


Displays

Daly and Feng [46, 47] proposed a couple of methods for extending the bit
depth of classic 8-bit images and videos (eectively 6-bit due to MPEG-2
compression) for 10-bit monitors. New LCD monitors present higher contrast, typically around 1, 000 : 1, and a high luminance peak that is usually
around 400 cd/m2 . This means that displaying 8-bit data, without any
renement, would entail having the content linearly expanded for higher
contrast, resulting in artifacts such as banding/contouring. The goal of
Daly and Fengs methods is to create a medium dynamic range image, removing contouring in the transition areas, without particular emphasis on
overexposed and underexposed areas.

4.2.1 Amplitude Dithering


The rst method proposed by Daly and Feng [46], is based on amplitude
dithering by Roberts [183] (see Figure 4.5). Amplitude dithering, or noise
modulation, is a dithering technique that simply adds a noise pattern to an
image before quantization. This noise pattern is removed when the image
needs to be visualized. The bit depth is perceived as higher than the real
one because there is a subsequent averaging happening in the display and
in the HVS. Roberts technique was modied to apply it to high contrast
displays by Daly and Feng. Subtractive noise was employed instead of additive, since during visualization a monitor can not remove it. The authors

120

4. Expansion Operators for Low Dynamic Range Content

Figure 4.5. The pipeline for bit depth extension using amplitude dithering by
Daly and Feng [46].

modeled the noise by combining the eect of xed pattern display noise and
the one perceived by the HVS, making the noise invisible. They used the
contrast sensitivity function (CSF), which is a two-dimensional, anisotropic
function derived by psychophysical experiments [48]. The CSF is extended
in the temporal dimension [227] for moving images, which allows the noise
to have a higher variance; furthermore, they show that the range can be
extended by an extra bit.

4.2.2 Contouring Removal


The second algorithm proposed by Daly and Feng [47] presented a dierent approach, where contours are removed instead of being masked with
invisible noise. The rst step of the algorithm is to lter the starting image
at p bits using a low-pass lter (see Figure 4.6). The lter needs to be
wide enough to span across the false contours. Note that this operation

Figure 4.6. The pipeline for bit depth extension using decontouring by Daly and
Feng [47].

4.3. EO MATLAB Framework

121

increases the bit depth to n > p because, during averaging, a higher precision is needed than the one for the original values. Then this image is
quantized at p bits, where any contour that appears is a false one because
the image has no high frequencies. Subsequently, the false contours are
subtracted from the original image and the ltered image at p bits is added
to restore low frequency components. The main limitation of the algorithm
is that it does not remove artifacts at high frequencies, but they are hard
to detect by HVS due to frequency masking [69].

4.3

EO MATLAB Framework

EOs have two similar common steps. The rst one is the linearization of the
signal and extraction of the luminance channel. Linearization is achieved by
applying gamma removal. This is a straightforward and computationally
cheap method that is adopted by most of the existing EOs. The second one,
which is the last step in the implementation of an EO, is the restoration of
color information in the expanded input image (output image).
In the rst step, shown in Listing 4.1, the input image img is checked
to see whether it is a color image using the function check3Color.m under
the Util folder. Then, input parameters are veried if they are set; if
not, they are set equal to default parameters that are suggested by the
authors in their original work. An input parameter common to all EOs
is gammaRemoval. If it is higher than zero this means that img has not
been gamma corrected and an inverse gamma correction step is computed
exponentially by applying the gammaRemoval parameter to the input image
img. Otherwise, img is already corrected. At this point, luminance is
extracted using the lum function applied to the whole input image img.
In the second step, as shown in Listing 4.2, the color information is
reinserted into the output expanded image imgOut. This is followed by the
check3Col or ( img ) ;
if (~ exist ( Landis_alpha ) |~ exist ( dynRangeStar tL um ) |~ exist (
Landis_Ma x_ L u mi n an c e ) |~ exist ( gammaRemoval ) )
L a n d i s _ a l p h a =2.0;
d y n R a n g e S t a r t L u m =0.5;
L a n d i s _ M a x _ L u m i n a n c e =10.0;
g a m m a R e m o v a l = -1;
end
if ( gammaRemoval >0.0)
img = img .^ g a m m a R e m o v a l ;
end

Listing 4.1. Matlab Code: The initial steps common to all EOs.

122

4. Expansion Operators for Low Dynamic Range Content

imgOut = zeros ( size ( img ) ) ;


for i =1:3
imgOut (: ,: , i ) =( img (: ,: , i ) .* l2 ) ./ l ;
end
imgOut = R e m o v e S p e c i a l s ( imgOut ) ;

Listing 4.2. Matlab Code: The nal steps common to all EOs.

removal of any generated invalid pixels values using the function Remove
Specials.m under the Util folder.
An optional step, as with tone mapping, is color correction. In this case,
function ColorCorrection.m under the ColorSpace folder can be applied
again, but values for the variable correction need to be in (1, +). Note
that implementation in Matlab is not provided for all EOs in this chapter.
This is because some EOs are very complex systems and composed of many
parts that are dicult to describe in their entirety.

4.4

Global Models

Global models are those that apply the same single global expansion function on the LDR content at each pixel in the entire image.

4.4.1 A Power Function Model for Range Expansion


One of the rst expansion methods was proposed by Landis [109]. This
expansion method, used primarily for relighting digital three-dimensional
models from images (see Chapter 5), is based on power functions. The
luminance expansion is dened as

(1 k)Ld (x) + kLw, max Ld (x) if Ld (x) R,
Lw (x) =
Ld (x)
otherwise;


Ld (x) R
,
(4.3)
k=
1R
where R is the threshold for expansion (which is equal to 0.5 in the original
work), Lw, max is the maximum luminance that the user needs for the
expanded image, and is the exponent of fallo that controls the stretching
of the tone curve.
While this technique produces suitable HDR environment maps for IBL
(see Figure 4.7), it may not produce good quality images/videos that can
be visualized on HDR monitors. This is because it does not handle artifacts
such as exaggeration of compression or quantization artifacts.

4.4. Global Models

(a)

123

(b)

(c)

Figure 4.7. An example of IBL using Landis operator [109]. (a) The starting
LDR environment map. (b) The Happy Buddha is relit using the image in (a).
(c) The Happy Buddha is relit using expanded environment map in (a). Note
that directional shadows from the sun are now visible. (The Happy Buddha
model is courtesy of the Stanford 3D Models Repository.)

The Matlab code for Landis EO [109] is available in the file Landis
EO.m under the folder ExpansionMethods. Landis method takes as input
the following parameters: is Landis alpha, the threshold R is dynRange
StartLum, and Lw, max is LandisMaxLuminance.
The code of Landis EO is shown in Listing 4.3. After the common initial
steps are performed, the mean luminance value of img is computed using
% Luminance channel
l = lum ( img ) ;
% set the threshold to the luminance mean value
if ( dynRangeStar t Lu m <0)
d y n R a n g e S t a r t L u m = mean ( mean ( l ) ) ;
end
% search for the pixels that need to be expanded
toExpand = find (l >= d y n R a n g e S t a r t L u m ) ;
l2 = l ;
% expansion step
l2 ( toExpand ) =(( l ( toExpand ) - d y n R a n g e S t a r t L u m ) /(1.0 d y n R a n g e S t a r t L u m ) ) .^ L a n d i s _ a l p h a ;
l2 ( toExpand ) = l ( toExpand ) .*(1 - l2 ( toExpand ) ) + L a n d i s _ M a x _ L u m i n a n c e
* l ( toExpand ) .* l2 ( toExpand ) ;

Listing 4.3. Matlab Code: The expansion method by Landis [109].

124

4. Expansion Operators for Low Dynamic Range Content

the Matlab function mean. Next, the pixels above dynRangeStartLum are
selected using the Matlab function find and are stored into the variable
toExpand. The rst line of the expansion step computes the parameter
k of Equation (4.3), and the second line directly applies it. Note that
only pixels above dynRangeStartLum are modied, following the second
condition in Equation (4.3). Otherwise, the original luminance is assigned
to the variable l2.

4.4.2 Linear Scaling for HDR Monitors


In order to investigate how well LDR content is supported by HDR displays,
Aky
uz et al. [14] ran a series of psychophysical experiments. The experiments were run to evaluate tone mapped images, single exposure images,
and HDR images using a Dolby DR-37P HDR monitor. The experiments
involved 22 nave participants between 20 and 40 years old. In all experiments, ten HDR images ranging from outdoor to indoor, from dim to very
bright light conditions were used. The HDR images had around ve orders
of magnitude of luminance in order to be mapped directly to the Dolby
DR-37P HDR monitor [57].
The rst experiment was a comparison between HDR and LDR images
produced using various TMOs [62, 110, 180], an automatic exposure (that
minimizes the number of over/underexposed pixels), and an exposure chosen by subjects in a pilot study. Images were displayed on the DR-37P,
using calibrated HDR images and LDR images calibrated to match the
appearance on a Dell UltraSharp 2007FP 20.1 LCD monitor. Subjects
had the task of ranking images that looked best to them. For each original
test image, a subject had to watch a trial image for two seconds, which
was randomly chosen from the dierent types of images. The experimental
results showed that participants preferred HDR images. The authors did
not nd a large dierence in participant preference between tone mapped
and single exposure (automatic and chosen by the pilot) images.
In the second experiment, the authors compared expanded single exposure with HDR and single exposure images (automatic and chosen by the
pilot). To expand the single exposure images, they employed the following
expansion method in Equation (4.4):


Ld (x) Ld, min
Lw (x) = k
.
(4.4)
Ld, max Ld, min
Here k is the maximum luminance intensity of the HDR display, and is
the nonlinear scaling factor. For this experiment, images with dierent
values equal to 1, 2.2, and 0.45 were generated. The setup and the ranking
task were the same as the rst experiment. The results showed that brighter
chosen exposure expanded images were preferred to HDR images, and vice

4.4. Global Models

125

l = lum ( img ) ;
lmax = max ( max ( l ) ) ;
lmin = min ( min ( l ) ) ;
l2 = Oguz_Max *((( l - lmin ) /( lmax - lmin ) ) .^ Oguz_gamma ) ;

Listing 4.4. Matlab Code: The expansion method by Aky


uz et al. [14].

versa when they had the same mean luminance. Aky


uz et al. suggested that
mean luminance is preferable to contrast. Finally, another important result
is that linear scaling, where = 1, generated the most favored images,
suggesting that a linear scaling may be enough for an HDR experience.
The authors worked only with high resolution HDR images and without compression artifacts, which were artistically captured. While this
works well under such ideal conditions, in more realistic scenarios, such
as television programs or DVDs where compression is employed, this may
not always be the case. In these cases, a more accurate expansion needs
to be done in order to avoid amplication of compression artifacts and
contouring.
Listing 4.4 provides the Matlab code of Aky
uz et al.s EO [14]. The
full code may be found in the le AkyuzEO.m. The method takes the following parameters as input: the maximum luminance intensity of the HDR
display Akyuz Max (k in Equation (4.4)) and the nonlinear scaling factor
Akyuz gamma ( in Equation (4.4)). After the preliminary steps, the maximum and the minimum luminance pixels are calculated from the variable l
using the Matlab functions max.m and min.m, respectively. The expansion
step, which is equivalent to Equation (4.4), is then executed.

4.4.3 Gamma Expansion for Overexposed LDR Images


Masia et al. [139] conducted two psychophysical studies to analyze the behavior of an EO across a wide range of exposure levels. The authors then
used the results of these experiments to develop an expansion technique
for overexposed content. In these experiments, three EOs were compared
(Rempel et al. [182], linear expansion, and Banterle et al. [19]) and the
results show how the performances decrease with the increase of the exposure of the content. Moreover, the authors observed that several artifacts,
typically visible in LDR renditions of an image, were produced by EOs.
These artifacts were not simply due to the incorrect intensity values but
also had a spatial component. Understanding how an EO can aect the
perception of the nal output image may help to develop better EOs.
As shown in Figure 4.8, the authors noticed that when the increase of
the exposure details are lost, this is due to the pixel saturation making
the corresponding color fade to white. Based on this observation, they

126

4. Expansion Operators for Low Dynamic Range Content

(a)

(b)

(c)

(d)

Figure 4.8. An example of the Masia et al.s method [139] applied to an overexposed LDR image at dierent exposures. (a) Original LDR image. (b), (c),
(d) Dierent f-stops after expansion. (The original image is courtesy of Diego
Gutierrez.)

proposed to expand an LDR image by making the remaining details more


prominent. This approach is dierent from the usual EOs approach, where
saturated areas are boosted.
A straightforward way to achieve this is gamma expansion. Masia et
al. proposed an automatic way to dene the value based on the dynamic
content of the input LDR image. Similarly to Aky
uz and Reinhard [11], a
key k value is computed as
k=

log Ld, H log Ld, min


,
log Ld, max log Ld, max

(4.5)

where Ld, H , Ld, min and Ld, max are respectively the logarithmic average,
the minimum luminance value and the maximum luminance value of the
input image. The k value is a statistic that helps clarify if the input image
is subjectively dark or light. In order to predict the gamma value automatically, a pilot study was conducted where users were asked to adjust
manually the value in a set of images. The data was empirically tted
with linear regression to the relationship
(k) = ak + b.

(4.6)

4.4. Global Models

(a)

127

(b)

(c)

(d)

Figure 4.9. An example where Masia et al.s method [139] creates an incorrect
(x) value. (a) LDR image used in this example. (b), (c), (d) Dierent f-stops
after expansion, with (k) = 1.475. This introduces a reciprocal that produces
an unnatural appearance. (The original image is courtesy of Paul Debevec.)

From the fitting, the values a = 10.44 and b = 6.282 are obtained. One
of the major drawbacks of this expansion technique is that it may fail to
utilize the dynamic range to its full extent [139]. Moreover, the a and b
values only work correctly on the set of test images. In some images not
belonging to the original data set, can have negative values. This results
in an unnatural appearance (see Figure 4.9).
Listing 4.5 provides the Matlab code of the Masia et al.s operator [139]. The full code can be found in the file MasiaEO.m. The method
takes as input the LDR image, img. After the initial steps common to all
% Calculate luminance
L = lum ( img ) ;
% Calculate image statistic s
Lav = logMean ( L ) ;
[r , c ]= size ( L ) ;
maxL = MaxQuart (L ,0.99) ;
minL = MaxQuart (L ,0.01) ;
imageKey = ( log ( Lav ) - log ( minL ) ) /( log ( maxL ) - log ( minL ) ) ;
% Calculate the gamma correctio n value
a_var = 10.44;
b_var = -6.282;
gamma_cor = imageKey * a_var + b_var ;
imgOut = img .^ gamma_cor ;

Listing 4.5. Matlab Code: The expansion method by Masia et al. [139].

128

4. Expansion Operators for Low Dynamic Range Content

EOs, the maximum and the minimum luminance pixels are calculated from
the variable L using the MaxQuart.m function under the Util folder, which
extracts percentiles. Then, the image key is calculated as in Equation (4.5).
Finally, each color channel is exponentiated by the value gamma cor, which
is calculated following Equation (4.6).

4.5

Classification Models

The methods of Meylan et al. [145, 146] and Didyk et al. [56] attempt to
expand dierent regions of the LDR content by identifying or classifying
dierent parts in the image such as highlights and light sources.

4.5.1 Highlights Reproduction for HDR Monitors


Meylan et al. [145, 146] presented an EO with the specic task of representing highlights in LDR images when displayed on HDR monitors. The
main idea is to detect the diuse and specular part of the image and to
expand these using dierent linear functions.
The rst step of the algorithm is to calculate a threshold that divides, highlights, and diuses regions in the luminance channel, Ld (see
Figure 4.10). Firstly, the image is ltered using a box lter of size m =

Figure 4.10. The pipeline for the calculation of the maximum diuse luminance
value in an image in Meylan et al. [146]. (The original image is courtesy of
Ahmet O
guz Aky
uz.)

4.5. Classification Models

129

Figure 4.11. The full pipeline for the range expansion in Meylan et al.s method
[146]. (The original image is courtesy of Ahmet O
guz Aky
uz.)

max(width, height)/50 to calculate value t1 as the maximum of the ltered


luminance channel. This operation is repeated for a lter of size 2m + 1 to
calculate t2 . Secondly, t1 is used as a threshold on the original luminance,
resulting in a mask. Subsequently, an erosion and dilation lter are applied
to the mask using t2 . These can be applied for some iterations in order to
obtain a stable mask. A few iterations are typically enough. At this point,
pixels with the value 1 in the mask are considered specular pixels, while
black pixels are considered as diuse ones. Therefore, is calculated as
the minimum luminance value of the specular pixels in Ld .
After the calculation of , the luminance channel is expanded using the
function in Equation (4.7):

if Ld (x) ,
s1 Ld (x)
Lw (x) = f (Ld (x)) =
s1 + s2 (Ld (x) ) otherwise;
s1 =

s2 =

1
,
Ld, max

(4.7)

where Ld, max = 1 since the image is normalized, and is the percentage
of the HDR display luminance allocated to the diuse part that is dened
by the user.
A global application of f can lead to quantization artifacts around the
enhanced highlights. These artifacts are reduced by applying a selective
lter in the specular regions. Figure 4.11 shows the full pipeline. Firstly,

130

4. Expansion Operators for Low Dynamic Range Content

the expanded luminance, f (Ld (x)), is ltered with a 5 5 average lter


obtaining f  . Secondly, f (Ld (x)) and f  are blended using linear interpolation and a mask, which is calculated by thresholding LDR luminance with
and applying a dilatation and a 5 5 average lter.
Finally, the authors ran a series of psychophysical experiments to determine the value of for f using a Dolby DR-37P HDR monitor [57]. The
results showed that for outdoor scenes users preferred a high value of ,
which means a small percentage of dynamic range allocated to highlights,
while for indoor scenes the percentage was the opposite. For indoor and
outdoor scenes of equal diuse brightness, users chose a low value for , so
they preferred more range allocated to highlights. The analysis of the data
showed that = 0.66 is a good general estimate.
This algorithm is designed for a specic taskthe reproduction of highlights on HDR monitors. The use for other tasks, such as enhancement of
videos, needs more processing and a classier, which was emphasized by
the authors evaluation experiment.
Listing 4.6, Listing 4.7, and Listing 4.8 provide the Matlab code
of Meylan et al.s EO [145, 146]. The full code can be found in the le
MelyanEO.m.
% Luminance channel
l = lum ( img ) ;
lmax = max ( max ( l ) ) ;
% Filtering with a box filter of size m +1
vS = max ( size ( l ) ) ; % max ( width , height )
kSize = round ( vS /50) +1;
h = fspecial ( average , kSize ) ;
Lfiltered = imfilter (l , h ) ;
t1 = max ( max ( Lfiltered ) ) ;
% Filtering with a box filter of size 2 m +1
kSize = round ( vS /25) +1;
h = fspecial ( average , kSize ) ;
Lfiltered = imfilter (l , h ) ;
t2 = max ( max ( Lfiltered ) ) ;
% T h r e s h o l d i n g the image luminance channel with threshold t1
mask = zeros ( size ( l ) ) ;
indx = find (l > t1 ) ;
mask ( indx ) =1;
% Removing single pixels
mask = bwmorph ( mask , clean ) ;
% n step Erosion and Dilatatio n
H_iter =[1 ,1 ,1;1 ,0 ,1;1 ,1 ,1];
n =3;
for i =1: n

4.5. Classification Models

131

% Mask2
tmpMask2 = imfilter ( mask , H_iter ) ;
Mask2 = ones ( size ( l ) ) ;
Mask2 ( find ( mask ==0) ) =0;
Mask2 ( find ( tmpMask2 <1) ) =0;
% Mask3
tmpMask2 = imfilter ( Mask2 , H_iter ) ;
Mask3 = zeros ( size ( l ) ) ;
Mask3 ( find ( Mask2 ==1) ) =1;
Mask3 ( find ( Mask2 >3) ) =1;
Mask3 ( find (l > t2 ) ) =1;
mask = Mask3 ;
end
itD = find ( mask ==0) ; % Diffuse part
itS = find ( mask ==1) ; % Specular part

Listing 4.6. Matlab Code: The classication of the specular and diuse areas
of an image by Meylan et al. [145, 146].

The function takes the following parameters as input: Meylan Max,


which is the maximum HDR display luminance, and Meylan lambda, which
is the percentage of the HDR display luminance allocated to the diuse
part of the input image. The rst part of the code identies the specular
and diuse parts of the LDR input image, img. This is rstly ltered with
an average lter of size m + 1 and 2m + 1 (kSize), where m is equal to
1/50 of the image size in the vertical direction. The lter is generated with
the Matlab function fspecial.m and applied to the luminance channel l
using the Matlab function imfilter.m. Two thresholds, t1 and t2, are
calculated from these ltered images as the maximum values of each image.
Then, a thresholding step is performed on the input LDR image luminance
to eliminate the pixels above the threshold t1. This is done by generating a binary mask, mask, where 1 values are locations of pixels above the
threshold t1, and 0 values are given for the rest. The threshold t2 is used
for the erosion and dilation step using the Matlab function bwmorph.m.
This step removes single pixels (with erosion) on mask and stores the result
in Mask2.
% Calculat i on of the curve constants
omega = min ( min ( l ( mask ==1) ) ) ;
s1 = Meylan_Ma x * M e y l a n _ l a m b d a / omega ;
s2 = Meylan_Ma x *(1.0 - M e y l a n _ l a m b d a ) /( lmax - omega ) ;
L = zeros ( size ( l ) ) ;
L ( itD ) = l ( itD ) * s1 ; % Diffuse expansion
L ( itS ) = l ( itS ) * s1 +( l ( itS ) - omega ) * s2 ; % Specular expansion

Listing 4.7. Matlab Code: The expansion step in Meylan et al. [145, 146].

132

4. Expansion Operators for Low Dynamic Range Content

% Filtered luminance
h5 = fspecial ( average ,5) ;
LFiltered = imfilter (L , h5 ) ;
% Smoothing mask
smask = zeros ( size ( l ) ) ;
smask (l > omega ) =1;
tmpSmask2 = imfilter ( smask , H_iter ) ;
smask2 = smask ;
smask2 ( find ( tmpSmask2 >1) ) =1;
smask3 = imfilter ( smask2 , h5 ) ;
% The expanded part and its filtered version are blended using
the mask
Lfinal = L .*(1 - smask3 ) + smask3 .* LFiltered ;

Listing 4.8. Matlab Code: The blending step in Meylan et al. [145, 146].

Once diuse and specular regions are extracted, Equation (4.7) is applied as in Listing 4.7. The parameter is computed as a result of the
erosion and dilation step and used in the computation of the expanded
luminance.
In the nal step, Listing 4.8, the expanded luminance, L, is ltered
using a low pass lter, h5, using imfilter.m, obtaining Lfiltered. Then,
the blending mask, mask3, is computed by applying the dilatation lter
H iter, followed by a single pixel removal step and a low pass lter using
h5. Finally, L and Lfitlered are linearly interpolated using mask3.

4.5.2 Enhancement of Bright Video Features for HDR Displays


Didyk et al. [56] proposed an interactive system for enhancing brightness
of LDR videos, targeting and showing results for DVD content. The main
idea of this system was to classify a scene into three components: diuse,
reections, and light sources, and then to enhance only reections and light
sources. The authors explained that diuse components are dicult to
enhance without creating visual artifacts, and it was probably the intention
of lmmakers to show them saturated as opposed to light sources and
clipped reections. The system works on nonlinear values because the goal
is the enhancement and not physical accuracy.
The system consists of three main parts: preprocessing, classication,
and enhancement of clipped regions. The pipeline can be seen in Figure 4.12. The preprocessing step generates data needed during the classication. In particular, it determines clipped regions using a ood-ll
algorithm. At least one channel must be saturated (over 230 for DVD
content), and luma values must be greater than 222. Also, at this stage,

4.5. Classification Models

133

Figure 4.12. The pipeline of the system proposed by Didyk et al. [56]: preprocessing (calculation of features vector, optical ow, and clipped regions), classication of regions using temporal coherence and a training set, user corrections
(with updating of the training set), and brightness enhancement.

optical ow is calculated as well as other features such as image statistics,


geometric features, and neighborhood characteristics.
Classication determines lights, reections, and diuse regions in a
frame and relies on a training set of 2, 000 manually classied regions. Primarily, a support vector machine [208] with kernel k(z, z ) = exp(z
z  2 ) performs an initial classication of the regions. Subsequently, motion
tracking improves the initial estimation, using a nearest neighbor classier
based on an Euclidean metric:
d2 ((z, x, t), (z  , x , t )) = 50z z  2 + z z  2 + 5(t t )2 ,
where z are region features, x are coordinates in the image, and t is the
frame number. This is allowed to reach a classication error of 12.6% on
all regions used in the tests. Tracking of clipped regions using motion compensation further reduced the percentage of objects that require manual
correction to 3%. Finally, the user can supervise the classied regions,
correcting wrong classications using an intuitive user interface (see Figure 4.13).
Clipped regions are enhanced by applying a nonlinear adaptive tone
curve. This is calculated based on partial derivatives within a clipped
region stored in a histogram H. The tone curve is dened as a histogram
equalization on the inverted values of H, as shown in Equation (4.8):

f (b) = k

b


(1 H[j]) + t2 .

(4.8)

j=2

Here t2 is the lowest luma value for a clipped region, and k is a scale factor
that limits to the maximum boosting value m (equal to 150% for lights and

134

4. Expansion Operators for Low Dynamic Range Content

Figure 4.13. The interface used for adjusting classication results in Didyk et
al.s framework [56]. (The image is courtesy of Piotr Didyk.)

125% for reections):


k = N

m t2

j=1 (1

H[j])

The value N is the number of bins in H. To avoid contouring during


boosting, the luma channel is ltered with bilateral ltering, separating it
into ne details and a base layer, which is merged after luma expansion.
The method is semiautomatic because intervention of the user is required.

4.6

Expand Map Models

The methods of Banterle et al. [19], its extensions [20, 22], and Rempel et
al. [182] use a guidance method to direct the expansion of the LDR content. Following the terminology used in Banterle et al. [19], these guidance
methods are referred to as expand map methods.

4.6.1 Nonlinear Expansion Using Expand Maps


A general framework for expanding LDR content for HDR monitors and
IBL was proposed by Banterle et al. [19, 20]. The key points are the use of

4.6. Expand Map Models

135

Figure 4.14. The pipeline of Banterle et al.s method [19, 20].

an inverted TMO for expanding the range combined with a smooth eld
for the reconstruction of the lost overexposed areas.
The rst step of the framework is to linearize the input image. Figure 4.14 shows the pipeline. If the CRF is known, its inverse is applied
to the signal. Otherwise, blind general methods can be employed such as
Lin et al.s methods [121, 122]. Subsequently, the range of the image is
expanded by inverting a TMO. In Banterle et al.s implementation, the
inverse of the global Reinhard et al.s operator [180] was used. This is because the operator has only two parameters, and range expansion can be
controlled in a straightforward way. This inverted TMO is dened as
*
'
(

2
1
4
1 Ld (x) + 2
Ld (x) ,
Lw (x) = Lw, max Lwhite Ld (x) 1 +
2
Lwhite
(4.9)
where Lw, max is the maximum output luminance in cd/m2 of the expanded
image, and Lwhite (1, +) is a parameter that determines the shape of
the expansion curve. This is proportional to the contrast. The authors
suggested a value of Lwhite Lw, max to increase the contrast while limiting
artifacts due to expansion.
After range expansion, the expand map is computed. The expand map
is a smooth eld representing a low frequency version of the image in areas
of high luminance. It has two main goals. The rst is to reconstruct lost
luminance proles in overexposed areas of the image. The second goal is
to attenuate quantization or compression artifacts that can be enhanced
during expansion. The expand map was implemented by applying density
estimation on samples generated using importance sampling (median-cut
sampling [54]). Finally, the expanded LDR image and the original one
are combined using linear interpolation where the expand map acts as an
interpolation weighting. Note that low luminance values are kept as in the
original value, which avoids compression for low values when Lwhite is set
to a high value (around 104 or more). Otherwise, artifacts such as contours
could appear.

136

4. Expansion Operators for Low Dynamic Range Content

(a)

(b)

Figure 4.15. Application of Banterle et al.s method [19, 20] for relighting synthetic objects. (a) Lucys model is relighted using St. Peters HDR environment
map. (b) Lucys model is relighted using an expanded St. Peters LDR environment map (starting at exposure 0). Note that colors in (a) and (b) are close; this
means that they are well reconstructed. Moreover, reconstructed shadows in (b)
follow the directions of the ones in (a), but they are less soft and present some
aliasing. (The original St. Peters HDR environment map is courtesy of Paul
Debevec. The Lucys model is courtesy of the Stanford 3D Models Repository.)

The framework was extended for automatically processing images and


videos in Banterle et al. [22]. This is achieved using three-dimensional sampling algorithms, volume density estimation, and a number of heuristics for
determining the parameters of each component of the framework. Moreover, a colored expand map was adopted, allowing the reconstruction of
clipped colors. An important feature of this extension is the possibility to
transfer edges from the original frame to the expand map, thereby reducing
distortions at edges.
The main problem of this extension is the speed, but real-time performances (more than 24 fps) on high denition content can be achieved using
point-based graphics on a modern GPU [24]. This algorithm presents a general solution for visualization on HDR monitors and IBL (see Figure 4.15).
Moreover, it was tested using HDR-VDP [135] for both tasks to prove its
eciency compared with simple exposure methods. The main limit of the
framework is that large overexposed areas (more than 30% of the image)
cannot be reconstructed using the expand map, since they produce gray
smooth areas in the overexposed areas. This is because there is not enough
information to exploit.
The Matlab code for the Banterle et al.s operator [22] can be found
in the le BanterleEO.m. The method takes the following parameters as

4.6. Expand Map Models

137

% Luminance channel
L = lum ( img ) ;
maxL = max ( max ( L ) ) ;
L = L / maxL ;
% Luminance expansion
LWhite2 = LWhite ^2;
Lexp = LWhite * LMaxOut *( L -1+ sqrt ((1 - L ) .^2+4* L / LWhite2 ) ) ;
% Combining expanded and unexpande d luminance channels
expand_ma p = B a n t e r l e E x p a n d M a p ( img ,0.95) ;
LFinal = zeros ( size ( img ) ) ;
for i =1:3
LFinal (: ,: , i ) = 2.^( log2 ( L +1 e -6) .*(1 - expand_ma p (: ,: , i ) ) +
log2 ( Lexp +1 e -6) .* expand_map (: ,: , i ) ) ;
LFinal (: ,: , i ) = R e m o v e S p e c i a l s ( LFinal (: ,: , i ) ) ;
end
% Removing the old luminance
imgOut = zeros ( size ( img ) ) ;
if (~ colorRec )
Ltmp = lum ( LFinal ) ;
for i =1:3
LFinal (: ,: , i ) = Ltmp ;
end
end

Listing 4.9. Matlab Code: The EO by Banterle et al. [22].

input: LMaxOut (Lw, max ), which is the maximum output luminance in the
nal expanded luminance in cd/m2 ; LWhite (Lwhite ), which is the stretching
parameter of the tone curve; and colorRec, a Boolean ag for enabling
color reconstruction.
After running the common initialization of an EO, the operator normalizes the luminance channel. At this point, the luminance channel is expanded by applying Equation (4.9) and is stored in Lexp. Then, the expand
map, expand map, is calculated using the function BanterleExpandMap.m.
Finally, L and Lexp are linearly interpolated using expand map as interpolation weights. Note that expand map takes into account the three color
channels, meaning that the result of interpolation, LFinal, has colors. If
the ag colorRec is set to 0, LFinal is converted to a single channel image
using the function lum.
The function BanterleExpandMap.m can be found under the EO folder.
The Matlab code is shown in Listing 4.10 and Listing 4.11. Note that
this code presents some dierences from the original technique. To speed
up computations in Matlab, the density estimation is approximated using
Gaussian Filtering. In the rst part, the function calculates light samples
by calling the function MedianCut.m, generating the maximum number of

138

4. Expansion Operators for Low Dynamic Range Content

if ( perCent <0| perCent >1)


perCent = 0.95;
end
[r ,c , col ]= size ( img ) ;
nLights = 2.^( round ( log2 ( min ([ r , c ]) ) +1) ) ;
[ imgOut , lights ]= MedianCut ( img , nLights ,0) ;
% Generatio n of the histogram
window = round ( min ([ r , c ]) /(2* sqrt ( nLights ) ) ) ;
Lout = lum ( imgOut ) ;
H = zeros ( length ( lights ) ,1) ;
for i =1: length ( lights )
[ X0 , X1 , Y0 , Y1 ]= G e n e r a t e B B o x ( lights ( i ) .x , lights ( i ) . y ,r ,c ,
window ) ;
indx = find ( Lout ( Y0 : Y1 , X0 : X1 ) >0) ;
H ( i ) = length ( indx ) ;
end
% sort H
H = sort ( H ) ;
Hcum = cumsum ( H ) ;
percentile = round ( nLights * perCent ) ;
[ val , indx ]= min ( abs ( Hcum - percentile ) ) ;
t h r e s h o l d S a m p l e s = H ( indx ) ;
% samples clamping
for i =1: length ( lights )
[ X0 , X1 , Y0 , Y1 ]= G e n e r a t e B B o x ( lights ( i ) .x , lights ( i ) . y ,r ,c ,
window ) ;
indx = find ( Lout ( Y0 : Y1 , X0 : X1 ) >0) ;
if ( length ( indx ) < t h r e s h o l d S a m p l e s )
X = ClampImg ( round ( lights ( i ) . x * c ) ,1 , c ) ;
Y = ClampImg ( round ( lights ( i ) . y * r ) ,1 , r ) ;
imgOut ( Y ,X ,:) =0;
Lout (Y , X ) = 0;
end
end

Listing 4.10. Matlab Code: The first part of the expand map code by Banterle
et al. [22].

samples for the given image, nLights. At this point, samples that do not
have enough neighbors are removed because they can introduce artifacts.
This is achieved by calculating a histogram H for each sample with the
number of neighbors as the entry. From this histogram, the sample with
the number of neighbors equal to a percentage, percentile, of nLights
is chosen. The number of neighbors of this sample, thresholdSamples, is
used as a threshold to remove samples.
In the second part, Listing 4.11, this function filters the rasterized samples, imgOut, and transfers strong edges from the original LDR image onto
imgOut. In this case the Lischinskis minimization function, Lischinski

4.6. Expand Map Models

139

LLog = log2 ( lum ( img ) +1 e -6) ;


expand_ma p = zeros ( size ( img ) ) ;
% Quick " density estimatio n "
CFiltered = G a u s s i a n F i l t e r W i n d o w ( imgOut , window *8) ;
for i =1:3
% Edge transfer
expand_map (: ,: , i ) = L i s c h i n s k i M i n i m i z a t i o n ( LLog , CFiltered
(: ,: , i ) , 0.07* ones ([ r , c ]) ) ;
end
% Normalization
expand_ma p = expand_map / max ( max ( max ( expand_map ) ) ) ;

Listing 4.11. Matlab Code: The second part of the expand map code by Banterle
et al. [22].

Minimization.m, is used instead of the bilateral lter. Note that the function bilateralFilter.m can produce some artifacts when large kernels
are employed. Finally, the expand map (expand map) is normalized.

4.6.2 LDR2HDR: On-the-Fly Reverse Tone Mapping of Legacy


Video and Photographs
A similar technique based on expand maps was proposed by Rempel et
al. [182]. Their goal was real-time LDR expansion for videos and images.
The algorithm pipeline is shown in Figure 4.16.
The rst step of the LDR2HDR algorithm is to remove artifacts due
to the compression algorithms of media (such as MPEG [Moving Picture
Experts Group]) by using a bilateral lter with small intensity and spatial
kernels. Sophisticated artifact removal is not employed due to real-time
constraints. The next step of the method is to linearize the signal, using
an inverse gamma function. Once the signal is linearized, the contrast is
stretched in an optimized way for the Dolby DR-37P HDR monitor [57].
A simple linear contrast stretching is applied to boost values; however,

Figure 4.16. The pipeline of Rempel et al.s method [182].

140

4. Expansion Operators for Low Dynamic Range Content

(a)

(b)

(c)

(d)

Figure 4.17. Application of Rempel et al.s method [182] to the Sunset image.
(a) Original LDR image. (b), (c), (d) Dierent f-stops after expansion.

the authors limited the maximum contrast to 5, 000 : 1 to avoid artifacts.


This means that the minimum value was mapped to 0.015 cd/m2 while the
maximum was mapped to 1, 200 cd/m2 . To enhance brightness in bright
regions, a brightness enhance function (BEF) is employed. This function is
calculated by applying a threshold of 0.92 (on a scale [0, 1] for LDR values).
At this point the image is Gaussian ltered using a lter with a = 30
(150 pixels), which is chosen for 1920 1080 content. In order to increase
contrast around edges, an edge stopping function is used. Starting from
saturated pixels, a ood-ll algorithm strategy is applied until an edge is
reached, which is estimated using gradients. Subsequently, a morphological
operator followed by a Gaussian lter with a smaller kernel is applied to
remove noise. The BEF is mapped in the interval [1, ] where = 4 and
is nally multiplied with the scaled image to generate the HDR image (see
Figure 4.17). To improve eciency, the BEF is calculated using Laplacian
pyramids [29], which can be implemented on the GPU or FPGA in an
ecient way.
The algorithm was evaluated using HDR-VDP [135] comparing the linearized starting image with the generated HDR image. This evaluation
was needed to show that the proposed method does not introduce spatial

4.6. Expand Map Models

141

artifacts during expansion of the content. Note that LDR2HDR processes


each frame separately and may be not temporally coherent due to the nature of the BEF.
Kovaleski and Oliveira [103] proposed an improvement for Rempel et
al.s method [182] that exploits the bilateral lter. This is used for noise
reduction and for generating the BEF. This is computed by applying the
bilateral lter to the overexposed pixels in the luminance channel. To speed
up computation, the method uses the bilateral grid data structure [34] on
the GPU. This achieves real-time performance, around 25 fps, on full HD
content without noise reduction and subsampling. Another advantage of
this technique is that the computed BEF introduces fewer distortions than
Rempel et al.s method [182].
The Matlab code for Rempel et al.s EO [182] can be found in the le
RempelEO.m. The method takes as input a Boolean ag, noiseReduction,
that is required if noise removal is needed.
The code of Rempel et al.s EO is shown in Listing 4.12. After the
common initial steps are performed, if the ag noiseReduction is set to
1 each color channel of the image is ltered by applying a bilateral lter
with r = 0.05 and s = 0.8. Then the luminance channel, L, is expanded
by applying a linear expansion and stored in Lexp. Subsequently, the BEF
is calculated using the Matlab function RempelExpandMap.m on L and is
stored in expand map after scaling it in the interval [1, ]. Finally, Lexp is
multiplied by expand map obtaining the nal expanded luminance channel.
% noise reduction using a gentle bilateral filter of size 4
pixels
% ( which is equal to sigma_s =0.8 sigma_r =0.05)
if ( n o i s e R e d u c t i o n )
for i =1:3
minC = min ( min ( img (: ,: , i ) ) ) ;
maxC = max ( max ( img (: ,: , i ) ) ) ;
img (: ,: , i ) = b i l a t e r a l F i l t e r ( img (: ,: , i ) ,[] , minC , maxC
,0.8 ,0.05) ;
end
end
% Luminance channel
L = lum ( img ) ;
% maxmimum luminance as in the original paper
maxL =1200.0;
% rescale alpha as in the original paper
r e s c a l e _ a l p h a =4.0;
% Luminance expansion
Lexp =( L +1/256) *( maxL -0.3) ;

142

4. Expansion Operators for Low Dynamic Range Content

% Generate expand map


expand_map = R e m p e l E x p a n d M a p ( L ) ;
% Remap expand map range in [1 ,... , r e s c a l e _ a l p h a ]
expand_map = expand_ma p *( rescale_alpha -1) +1;
% Final HDR Luminance
Lfinal = expand_map .* Lexp ;

Listing 4.12. Matlab Code: The EO by Rempel et al. [182].

The function RempelExpandMap.m can be found under the EO folder.


The Matlab code is shown in Listing 4.13 and Listing 4.14. In the rst
part of this function, Listing 4.13, a threshold (threshold) is applied to
L obtaining a mask (mask). This is ltered using a Gaussian lter with a
window of size 150 150, which is stored in sbeFil with normalized values.
% saturated pixels threshold
t h r e s h o l d I m g =254/255;
% Images
t h r e s h o l d V i d e o =230/255; % Videos
if (~ exist ( video_flag ) )
video_flag = 0;
end
if ( video_flag )
threshold = t h r e s h o l d V i d e o ;
else
threshold = t h r e s h o l d I m g ;
end
% binary map for saturated pixels
indx = find (L > threshold ) ;
mask = zeros ( size ( L ) ) ;
mask ( indx ) =1;
mask = double ( bwmorph ( mask , clean ) ) ;
% mask = double ( CleanWell ( mask ,1) ) ;
% Filtering with a 150 x150 Gaussian kernel size
sbeFil = G a u s s i a n F i l t e r ( mask ,30) ;
% Normalization
sbeFilMax = max ( max ( sbeFil ) ) ;
if ( sbeFilMax >0.0)
sbeFil = sbeFil / sbeFilMax ;
end

Listing 4.13. Matlab Code: The rst part of RempelExpandMap.m for generating
the expand map of an image in Rempel et al. [182].

4.6. Expand Map Models

143

% 5 x5 Gradient masks for thick gradients


Sy =[ -1 , -4 , -6 , -4 , -1 ,...
-2 , -8 , -12 , -8 , -2 ,...
0 ,0 ,0 ,0 ,0 ,...
2 ,8 ,12 ,8 ,2 ,...
1 ,4 ,6 ,4 ,1];
Sx = Sy ;
dy = imfilter (L , Sy ) ;
dx = imfilter (L , Sx ) ;
% magnitude of the directio na l gradient
grad = sqrt ( dx .^2+ dy .^2) ;
grad = grad / max ( max ( grad ) ) ;
% threshold for the gradient
tr =0.05;
% maximum number of iteration for the flood fill
maxIter =1000;
for k =1: maxIter
% Flood fill
tmp = double ( bwmorph ( mask , dilate ) ) ;
tmp = abs ( tmp - mask ) ;
indx = find ( tmp >0& grad < tr ) ;
mask ( indx ) =1;
% ended ?
stopping = length ( indx ) ;
if ( stopping <1)
break ;
end
end
% Filtering with a 5 x5 Gaussian Kernel
mask2 = G a u s s i a n F i l t e r ( double ( mask ) ,1) ;
% Multiply the flood fill mask with the BEF
expand_ma p = sbeFil .* mask2 ;

Listing 4.14. Matlab Code: The second part of RempelExpandMap.m in Rempel


et al. [182]

In the second part of RempelExpandMap.m, Listing 4.14, the gradients of


L are computed and stored in grad using the Matlab function imfilter.
Then, mask is expanded using a ood-ll mechanism (in the for loop) until
strong edges in grad are found. This new mask is ltered using a Gaussian
lter with a window of size 5 and is stored in mask2. Finally, the output
mask, expand map, is computed by multiplying mask2 and sbeFil.

144

4.7

4. Expansion Operators for Low Dynamic Range Content

User-Based Models: HDR Hallucination

Since it may not always be possible to recover missing HDR content using
automatic approaches, a dierent, user-based approach was proposed by
Wang et al. [218] whereby detailed HDR content can be added to areas
that are meant to be expanded. The authors demonstrated the benets of
an in-painting system to recover lost details in overexposed and underexposed regions of the image, combined with a boosting of the luminance.
The whole process was termed hallucination, and their system presents a
mixture between automatic and user-based approaches.
The rst step of hallucination is to linearize the signal. The pipeline is
shown in Figure 4.18. This is achieved with an inverse gamma function with
= 2.2. This is the standard value for DVDs and television formats [92].
After this step, the image is decomposed into large-scale illumination and
ne texture details. A bilateral lter is applied to the image I obtaining a
ltered version If . The texture details are obtained as Id = I/If . Radiance
for large-scale illumination If is estimated using a linear interpolation of
elliptical Gaussian kernels. Firstly, a weight map, w, is calculated for each
pixel:
C Y (x)
ue

Y (x) [0, Cue ),


Cue
w(x) = 0
Y (x) [Cue , Coe ),

Y (x)Coe
Y (x) [Coe , 1],
1Coe
where Y (x) = Rs (x) + 2Gs (x) + Bs (x), and Cue and Coe are respectively
the thresholds for underexposed and overexposed pixels. The authors suggested values of 0.05 and 0.85 for Cue and Coe , respectively. Secondly, each
overexposed region is segmented and tted with an elliptical Gaussian lobe

Figure 4.18. The pipeline of the Wang et al. [218] method. (The original image
is courtesy of Ahmet O
guz Aky
uz.)

4.8. Summary

145

G, where variance of the axis is estimated using region extents, and the prole is calculated using an optimization procedure based on non-overexposed
pixels at the edge of the region. The luminance is blended using a simple
linear interpolation,
O(x) = w(x)G(x) + (1 w(x)) log 10 Y (x).
Optionally, users can add Gaussian lobes using a brush. The texture
details, Id , are reconstructed using a texture synthesis technique similar
to [25], where the user can select an area as a source region by drawing
it with a brush. This automatic synthesis has some limits when scene
understanding is needed; therefore, a warping tool is included. This allows
the user to select, with a stroke-based interface, a source region and a
target region; pixels will be transferred. This is a tool similar to the stamp
and healing tools in Adobe Photoshop [6]. Finally, the HDR image is built
by blending the detail and the large-scale illumination. This is performed
using Poisson image editing [170] in order to avoid seams in the transition
between expanded overexposed areas and well-exposed areas.
This system can be used for both IBL and visualization of images, and
compared with other algorithms it may maintain details in clamped regions.
However, the main problem of this approach is that it is user-based and not
automatic, which potentially limits its use to single images and not videos.

4.8

Summary

An overview of all methods discussed is presented in Table 4.1. This summarizes what techniques are used and how they compare in terms of quality
and performance. Most of the methods expand the dynamic range using
either a linear (with/without a remapping of the range to articially increase it), or nonlinear function, while Meylan et al. use a two-scale linear
function. The reconstruction methods aim at smoothly expanding the dynamic range and a variety of methods are proposed. Unsurprisingly, the
choice of expansion method and reconstruction inuences the computational performance of the method and its quality. Performances are based
on the timings from the individual papers and/or the complexity of the
computation involved, where fast performance would make it possible to
perform in real-time on current hardware while slow would require a handful of seconds. Wang et al.s method requires a manual intervention somewhat hindering real-time performance. The quality results were presented
in other publications, primarily the psychophysical experiments shown in
Banterle et al. [23]. It is clear that dierent methods are suitable for
dierent applications. The more straightforward methods are faster and

146

Method

4. Expansion Operators for Low Dynamic Range Content

Reconstruction+
Noise Reduction
Additive Noise
Filtering
N/A
N/A
N/A
Filtering

Speed

Quality

ADHCDT
CRHCDT
PFMRE
LSFHDT,
GEOELT,
HGFHDT,

Expansion
Function
Linear
Linear
Nonlinear
Linear
Nonlinear
Linear

Fast
Fast
Fast
Fast
Fast
Fast

EBVFHDT

Nonlinear

NLEUEMT
LDR2HDRT
HDRH

Nonlinear
Linear
Nonlinear

Filtering+
Classication
Expand Map
Expand Map
Bilateral Filtering+
Texture Transfer

Slow,
Manual
Fast in HW
Fast in HW
Manual

Good
Good
Good IBL
Average
Good
Average ,
Good
Highlights
Good
Good
Good
Good

Table 4.1. Classication of algorithms for expansion of LDR content. While superscript T means that the operator is temporal and suitable for LDR videos
expansion, T, means that the operator can potentially be used for LDR videos
expansion. is based on a psychophysical study in Didyk et al. [56]. is designed
for medium dynamic range monitor and not for IBL.  is based on a psychophysical study in Banterle et al. [23]. is based on a psychophysical study in Masia
et al. [139]. See Table 4.2 for a clarication of the key.

Key
ADHCD
CRHCD
PFMRE
LSFHD
GEOEL
HGFHD
EBVFHD
NLEUEM
LDR2HDR
HDRH

Name
Amplitude Dithering for High Contrast Displays
[46]
Contouring Removal for High Contrast Displays
[47]
A Power Function Model for Range Expansion
[109]
Linear Scaling for HDR Models
[14]
Gamma Expansion for Overexposed LDR Images
[139]
Highlights Reproduction for HDR Monitors
[145, 146]
Enhancement of Bright Video Features for HDR Displays
[56]
Nonlinear Expansion using Expand Maps
[19, 20, 22]
On-the-Fly Reverse Tone Mapping of Legacy Video and Photographs
[182]
HDR Hallucination
[218]

Table 4.2. Key to EOs for Table 4.1.

4.8. Summary

147

more suitable for IBL or for just improving highlights. For more complex
still scenes and/or videos where further detail may be desirable, the more
complex expansion methods are preferable.

This page intentionally left blank

5
Image-Based Lighting

Since HDR images may represent the true physical properties of lighting
at a given point, HDR images can improve the rendering process. In particular, a series of techniques commonly referred to as image-based lighting
(IBL) provide a method of accelerating the computation of digital images
by rendering images lit by HDR images that are, almost always, generated by capturing the entire sphere of lighting at a point in a real scene.
Eectively, IBL methods render images by using the captured HDR image as lighting in shading computations, recreating the physical lighting
conditions in the virtual scene as that in the real scene. This results in
realistic-looking images that have been embraced by the lm and games
industries.

5.1

Environment Map

IBL usually takes as input an image, termed the environment map, that
captures irradiance values of the real-world environment for each direction,
D = [x, y, z] , around a point. Therefore, an environment map can be
parameterized on a sphere. Dierent two-dimensional projection mappings
of a sphere can be adopted to encode the environment map. The most
popular methods used in computer graphics are: angular mapping (Figure 5.1(a)), cube mapping (Figure 5.1(b)), and latitude-longitude mapping
(Figure 5.1(c)).
Listing 5.1 provides the Matlab code for converting between one projection and another. The function ChangeMapping.m, which can be found
under the EnvironmentMaps folder, accepts as input the environment map
to be converted img. Two strings representing the original mapping and

149

150

5. Image-Based Lighting

(a)

(b)

(c)

Figure 5.1. The Computer Science environment map encoded using the projection
mappings. (a) Angular map. (b) Cube-map unfolded into a horizontal cross.
(c) Latitude-longitude map.

the one to be converted to, via the parameters mapping1 and mapping2,
respectively, are also passed as parameters. This function operates by rst
converting the original mapping into a series of directions via the functions
LL2Direction.m, Angular2Direction.m, and CubeMap2Direction.m,
which can be found under the EnvironmentMaps folder. These represent conversions from original maps stored using latitude-longitude, angular, and cube-map representations, respectively. The second step converts
from the sets of directions towards the second mapping using the functions
Direction2LL.m, Direction2Angular.m, and Direction2CubeMap.m,
which can be found under the EnvironmentMaps folder. Finally, bilinear interpolation is used for the nal images. Note that some mapping
methods, such as the angular and cube mapping, generate images that do
not cover the full area of a rectangular image. During interpolation, these
empty areas are set to invalid values (i.e., NaN or Inf oat values). In order to remove these invalid values, some masks are generated. These masks

5.1. Environment Map

151

are created using the functions CrossMask.m and AngularMask.m that are
respectively used for the cube and angular mapping methods. These functions may be found under the EnvironmentMaps folder.
function imgOut = C h a n g e M a p p i n g ( img , mapping1 , mapping2 )
% Is it a three colour channels image ?
c h e c k 3 C o l o r 3 ( img ) ;
% First step generatio n of direction s
D =[];
[r ,c , col ] = size ( img ) ;
switch mapping1
case LongitudeLat i tu de
D = L L 2 D i r e c t i o n (r , c ) ;
case Angular
D = Angular2Direction(r,c);
case CubeMap
D = CubeMap2Direction(r,c);
otherwise
error ( C h a n g e M a p p i n g : The following mapping is not
recognize d . ) ;
end
% Second step i n t e r p o l a t i o n of values
imgOut = [];
flag = 0;
switch mapping2
case LongitudeLat i tu de
if ( strcmpi ( mapping1 , mapping2 ) ==1)
imgOut = img ;
else
[ X1 , Y1 ] = D i r e c t i o n 2 L L ( D ) ;
maxCoord = max ([ r , c ]) ;
img = imresize ( img ,[ maxCoord , maxCoord *2] , bilinear
) ;
flag = 1;
end
case Angular
if ( strcmpi ( mapping1 , mapping2 ) ==1)
imgOut = img ;
else
[ X1 , Y1 ] = D i r e c t i o n 2 A n g u l a r ( D ) ;
flag = 1;
end
case CubeMap
if ( strcmpi ( mapping1 , mapping2 ) ==1)
imgOut = img ;
else
[ X1 , Y1 ] = D i r e c t i o n 2 C u b e M a p ( D ) ;
flag = 1;

152

5. Image-Based Lighting

end
otherwise
error ( C h a n g e M a p p i n g : The following mapping is not
recognize d . ) ;
end
if ( flag )
% Interpolation
[r , c ] = size ( X1 ) ;
[ X0 , Y0 ] = meshgrid (1: c ,1: r ) ;
img = imresize ( img ,[ r , c ] , bilinear ) ;
imgOut = i n t e r p C o o r d s ( img , X0 , Y0 , X1 , Y1 ) ;
switch mapping2
case CubeMap
imgOut = imgOut .* CrossMask (r , c ) ;
case Angular
imgOut = imgOut .* AngularM as k (r , c ) ;
end
end
end

Listing 5.1. Matlab Code: Changing an environment map from one twodimensional projection to another.

5.1.1 Latitude-Longitude Mapping


A typical environment mapping is the so-called latitude-longitude mapping,
which maps the sphere into a rectangular domain. The mapping from
rectangular domain into directions on the sphere is given as


# $ #
$
sin sin
Dx

2x
,
Dy =
cos
=
,
(5.1)

(y 1)
Dz
cos sin
where x [0, 1] and y [0, 1]. Equation (5.1) transforms texture coordinates [x, y]T to spherical ones [, ]T and nally to direction coordinates
[Dx , Dy , Dz ]T . Listing 5.2 provides the Matlab code for converting from a
latitude-longitude representation to a set of directions following the above
equation. The main advantage of this mapping is that it is easy to understand and implement. However, it is not equal-area since pixels cover
dierent areas on the sphere. For example, pixels at the equators are covering more area than pixels at the poles. This problem has to be taken into
account when these environment maps are sampled.
function D = L L 2 D i r e c t i o n (r , c )
[ X0 , Y0 ] = meshgrid (1: c ,1: r ) ;

5.1. Environment Map

phi
=
theta =

153

pi *2*(( X0 / c ) ) ;
pi *( Y0 /r -1) ;

sinTheta = sin ( theta ) ;


D = zeros (r ,c ,3) ;
D (: ,: ,1) = sin ( phi ) .* sinTheta ;
D (: ,: ,2) = cos ( theta ) ;
D (: ,: ,3) = - cos ( phi ) .* sinTheta ;
end

Listing 5.2. Matlab Code: Converting from a latitude-longitude representation


to a set of directions as in Equation (5.1).

The inverse mapping, from the direction on the sphere into the rectangular domain, is given as
# $ #
x
1+
=
y

$
arctan(Dx , Dz )
.
1
arccos Dy )

(5.2)

Listing 5.3 provides the Matlab code for converting the set of directions into a latitude-longitude representation. The initial image resize in
the listing adjusts the input coordinates to have the same aspect ratio as
the output mapping. The rest of the code implements Equation (5.2).
function [ X1 , Y1 ] = D i r e c t i o n 2 L L ( D )
% Resamplin g
[r ,c , d ] = size ( D ) ;
maxCoord = max ([ r , c ]) ;
D = imresize (D ,[ maxCoord , maxCoord *2] , bilinear ) ;
% Coordina t es generatio n
X1 = 1+ atan2 ( D (: ,: ,1) ,-D (: ,: ,3) ) / pi ;
Y1 = acos ( D (: ,: ,2) ) / pi ;
X1 = R e m o v e S p e c i a l s ( X1 ) * maxCoord ;
Y1 = R e m o v e S p e c i a l s ( Y1 ) * maxCoord ;
end

Listing 5.3. Matlab Code: Converting from a set of directions to a latitudelongitude representation as in Equation (5.2).

5.1.2 Angular Mapping


Another type of environment mapping is angular mapping, which maps a
sphere into a two-dimensional circular area. The mapping from the two-

154

5. Image-Based Lighting

dimensional circular area into directions on the sphere is given as


$
# $ #
Dx
cos sin
arctan(1 2y, 2x 1)

Dy = sin sin ,
, (5.3)
= "

(2 x 1)2 + (2 y 1)2
Dz
cos
where x [0, 1] and y [0, 1]. The inverse mapping is

# $
# $
1
D
x
Dx
z

)
= + arccos
.
y
Dy
2
2 D2 + D2
x

(5.4)

This mapping avoids undersampling at the edges of the circle, but it is


not equal-area, as in the case of the latitude-longitude mapping.
The functions Angular2Direction.m and Direction2Angular.m implement Equation (5.3) and Equation (5.4), respectively. These functions
are not shown in the book and we refer the reader to the implementation
of the functions in the HDR Toolbox for further details.

5.1.3 Cube Mapping


Another popular mapping is the cube mapping, which maps a sphere onto
a cube. The cube is usually represented as six dierent images or an open
cube on a two-dimensional plane shaped as a cross. In this book, we use
the cross-shape representation. Since equations for each face are similar
but with only small changes, we present equations for a single face (the
front face of the cube). The other equations can be computed by swizzling
coordinates and signs from the presented equations. The mapping from
the front face into the directions on the sphere is given as


#
$
#
$
2x 1
Dx
1
2y 1 , if x 1 , 2 y 1 , 3 .
Dy = "
3 3
2 4
1 + (2x 1)2 + (2y 1)2
1
D
z

The inverse mapping is


+
,
# $
x
3 D
x
2Dz
,
if (Dz < 0) (Dz |Dx |) (Dz |Dy |).
=
D
y
2 2Dyz
The main advantage of cube mapping is that it is straightforward to
implement on graphics hardware. Only integer and Boolean operations are
needed to fetch pixels. The main disadvantage is that it is not equal-area.
Therefore, sampled pixels need to be weighted before they can be used.
Details of CubeMap2Direction.m and Direction2CubeMap.m, based on
the above equations, are not given in the book but are part of the HDR
Toolbox. Please refer to the HDR Toolbox for details.

5.2. Rendering with IBL

5.2

155

Rendering with IBL

IBL was used to simulate perfect specular eects such as pure specular
reection and refraction in the seminal work by Blinn and Newell [26]. It
must be noted that at the time of that publication, the reections were
limited to LDR images; however, the method applies directly to HDR
images. The reected/refracted vector at a surface point x is used as a
look-up into the environment map, and the color at that address is used
as the reected/refracted value (see Figure 5.2). This method allows very
fast reection/refraction (see Figure 5.3(a) and Figure 5.3(b)). However,

(a)

(b)

Figure 5.2. The basic Blinn and Newell [26] method for IBL. (a) The reective
case: the view vector v is reected around normal n obtaining vector r = v
2(n v)n, which is used as a look-up into the environment map to obtain the
color value t. (b) The refractive case: the view vector v coming from a medium
with index of refraction n1 enters in a medium with index of refraction n2 < n1 .
Therefore, v is refracted following Snells law, n1 sin 1 = n2 sin 2 , obtaining r.
This vector is used as a look-up into the environment map to obtain the color
value t.

156

5. Image-Based Lighting

(a)

(b)

(c)

Figure 5.3. An example of classic IBL using environment maps applied to the
Stanfords Happy Buddha model [78]. (a) Simulation of a reective material.
(b) Simulation of a refractive material. (c) Simulation of a diuse material.

there are some drawbacks. Firstly, concave objects cannot have internal
inter-reections/refractions, because the environment map does not take
into account local features (see Figure 5.4(a)). Secondly, reection/refraction can be distorted since there is a parallax between the evaluation
point and the point where the environment map was captured (see Figure 5.4(b)).
In parallel, Miller and Homan [148] and Green [76] extended IBL for
simulating diuse eects (see Figure 5.3(c)). This was achieved by convolving the environment map with a low-pass kernel:

L()(n )d,

E(n) =
(n)

(5.5)

5.2. Rendering with IBL

157

(a)

(b)

Figure 5.4. The basic Blinn and Newell [26] method for IBL. (a) The point x
inside the concavity erroneously uses t1 instead of t2 as color for refraction/reection. This is due to the fact that the environment map does not capture
local features. (b) In this case, both reected/refracted rays for the blue and
red objects are pointing in the same direction but from dierent starting points.
However, the evaluation does not take into account the parallax, so x1 and x2
share the same color t1 .

where L is the environment map, n is a direction in the environment map,


and (n) is the positive hemisphere of n. An example of a convolved
environment map is shown in Figure 5.5. In this case, the look-up vector
for a point is the normal. Nevertheless, this extension inherits the same
problems of Blinn and Newells method, namely, parallax issues and no
interreections.
Debevec [53] proposed a general method for IBL that takes into account
arbitrary BRDFs and interreections. In addition, he used environment
maps composed solely of HDR images that, as already noted, encode realworld irradiance data. The proposed method is based on ray tracing (see
Section 2.1.3). The evaluation, for each ray shot through a given pixel, is
divided into the following cases:
1. No intersections. The ray does not hit an object in its traversal of
the scene. In this case the color of the pixel is set to the one of the
environment map using the direction of the ray as look-up vector.
2. Pure specular. The ray intersects an object with a pure specular

158

5. Image-Based Lighting

(a)

(b)

Figure 5.5. The Computer Science environment map ltered for simulating diuse
reections. (a) The original environmental map. (b) The convolved environment
map using Equation (5.5).

material. In this case the ray is reected and/or refracted according


to the material properties.
3. General material. The ray intersects an object with a general
material described by a BRDF. In this case a modied Rendering
Equation [95] is evaluated as

L(x, ) = Le +

L( )fr (  , )V (x,  )n  d  ,

(5.6)

where x and n are, respectively, the position and normal of the hit
object; Le is the emitted radiance at point x, L is the environment

(a)

(b)

Figure 5.6. An example of IBL evaluating visibility, applied to Stanfords Happy


Buddha Model [78]. (a) IBL evaluation without shadowing. (b) IBL evaluation
with Debevecs method [53].

5.2. Rendering with IBL

159

map, fr is the BRDF,  is the out-going direction, and is the view


vector. The function V is the visibility function, a Boolean function
that determines if a ray is obstructed by an object or not.
The use of ray tracing removes the inter-reections/refractions limitations of the rst approaches. Furthermore, visibility is evaluated allowing
for shadows and a generally more realistic visualization (see Figure 5.6).
The evaluation of IBL using Equation (5.6) is computationally very
expensive. For instance, if all directions stored in an environment map are
used in the evaluation, the complexity for a single pixel is O(nf (m)), where
n is the number of pixels on the environment map, and f (m) = log m is
the complexity for computing the visibility in a scene with m objects. To
lower the computational complexity, two general methods are employed:
light source generation or Monte-Carlo integration.

5.2.1 Light Source Generation


A commonly used set of methods for the evaluation of Equation (5.6) is to
generate a nite set of directional light sources from the environment map.
The basic idea of these methods is to place a light source in the location of
areas on the environment corresponding with high luminance values and,
when rendering, use these light sources directly for the illumination.
A number of techniques based on light source generation have been proposed. The most popular are: structured importance sampling (SIS) [7],
k-means sampling (KMS) [101], Penrose tiling sampling (PTS) [161], and
median cut sampling (MCS) [54]. The main dierence in all these methods
lies in where the generated light sources are placed. In SIS, a light is placed
in the center of a stratum generated by k-center on a segmented environment map. In KMS, lights are generated randomly on the environment
map and then they are relaxed using Lloydss method [126]. In PTS, the
image is decomposed using Penrose tiles, where smaller tiles are applied
to areas with high luminance, and a light source is placed for each vertex
of a tile. Finally, in MCS, the image is hierarchically decomposed into a
two-dimensional tree that recursively subdivides the area into regions of
equal luminance until there are as many regions as light sources required.
The light sources are placed in the weighted center of each region.
function lights = MedianCut ( img , nlights , falloff )
global
global
global
global
global

L;
imgWork ;
limitSize ;
nLights ;
lights ;

160

5. Image-Based Lighting

if ( falloff )
img = F a l l O f f E n v M a p ( img ) ;
end
% Global variables i n i t i a l i z a t i o n
L = lum ( img ) ;
imgWork = img ;
nLights = round ( log2 ( nlights ) ) ;
[r , c ]= size ( L ) ;
limitSize =2; % limitSize = max ([ c , r ]) /2^ nluce ;
lights =[];
if (c > r )
M e d i a n C u t A u x (1 , c ,1 , r ,0 ,1) ;
else
M e d i a n C u t A u x (1 , c ,1 , r ,0 ,0) ;
end
end

Listing 5.4. Matlab Code: Median cut for light source generation.
function done = M e d i a n C u t A u x ( xMin , xMax , yMin , yMax , iter , cut )
global
global
global
global
global

L;
imgWork ;
limitSize ;
nLights ;
lights ;

done =1;
lx = xMax - xMin ;
ly = yMax - yMin ;
if (( lx > limitSize ) &&( ly > limitSize ) &&( iter < nLights ) )
tot = sum ( sum ( L ( yMin : yMax , xMin : xMax ) ) ) ;
pivot = -1;
if ( cut ==1)
% Cut on the X - axis
for i = xMin : xMax
c = sum ( sum ( L ( yMin : yMax , xMin : i ) ) ) ;
if (c >=( tot - c ) && pivot == -1)
pivot = i ;
end
end
if ( lx > ly )
M e d i a n C u t A u x ( xMin , pivot , yMin , yMax , iter +1 ,1) ;
M e d i a n C u t A u x ( pivot +1 , xMax , yMin , yMax , iter +1 ,1) ;
else

5.2. Rendering with IBL

161

M e d i a n C u t A u x ( xMin , pivot , yMin , yMax , iter +1 ,0) ;


M e d i a n C u t A u x ( pivot +1 , xMax , yMin , yMax , iter +1 ,0) ;
end
else
% Cut on the Y - axis
for i = yMin : yMax
c = sum ( sum ( L ( yMin :i , xMin : xMax ) ) ) ;
if (c >=( tot - c ) && pivot == -1)
pivot = i ;
end
end
if ( ly > lx )
M e d i a n C u t A u x ( xMin , xMax , yMin , pivot , iter +1 ,0) ;
M e d i a n C u t A u x ( xMin , xMax , pivot +1 , yMax , iter +1 ,0) ;
else
M e d i a n C u t A u x ( xMin , xMax , yMin , pivot , iter +1 ,1) ;
M e d i a n C u t A u x ( xMin , xMax , pivot +1 , yMax , iter +1 ,1) ;
end
end
else
% Generation of the light source
lights =[ lights , CreateLig h t ( xMin , xMax , yMin , yMax ,L , imgWork ) ];
end
end

Listing 5.5.
generation.

Matlab Code: Recursive part of median cut for light source

function newLight = CreateLigh t ( xMin , xMax , yMin , yMax ,L , img )


tot =( yMax - yMin +1) *( xMax - xMin +1) ;
totL = sum ( sum ( L ( yMin : yMax , xMin : xMax ) ) ) ;
if (( tot >0) &( totL >0) )
col = reshape ( img ( yMin : yMax , xMin : xMax ,:) , tot ,1 ,3) ;
value = sum ( col ,1) ;
% Position
[X , Y ] = meshgrid ( xMin : xMax , yMin : yMax ) ;
%X
Xval = L ( yMin : yMax , xMin : xMax ) .* X ;
Xval = sum ( sum ( Xval ) ) / totL ;
%Y
Yval = L ( yMin : yMax , xMin : xMax ) .* Y ;
Yval = sum ( sum ( Yval ) ) / totL ;
[r , c ]= size ( L ) ;
newLight = struct ( color , value , x , Xval /c , y , Yval / r ) ;
else
newLight =[];
end
end

Listing 5.6. Matlab Code: Generate light in the region for median cut algorithm.

162

5. Image-Based Lighting

(a)

(b)

Figure 5.7. MCS for IBL. (a) The environment map. (b) A visualization of the
cuts and samples for 32 samples.

Listing 5.4 shows Matlab code for MCS, which may be found in the
function MedianCut.m under the IBL folder. The input for this function is
the HDR environment map using a latitude-longitude mapping stored in
img and the number of lights to be generated in nlights. The falloff
can be set o if the fallo in the environment map is premultiplied into
the input environment. This code initializes a set of global variables, and
the image is computed as luminance and stored in L. Other global variables are used to facilitate the computation. The function then calls the
MedianCutAux.m function, with the initial dividing axis along the longest
dimension. MedianCutAux.m may be found under the IBL/util folder and
represents the recursive part of the computation and can be seen in Listing 5.5. This function computes the sum of luminance in the region and
then identies the pivot point where to split depending on the axis chosen. Finally, when the termination conditions are met, the light sources
are generated based on the centroid of the computed regions using function
CreateLight.m and stored into lights, assigning the average color of that
region. The code for CreateLight.m is given in Listing 5.6 and may be
found under the IBL/util folder.
After the generation of light sources, Equation (5.6) is evaluated as
L(x, ) = Le +

N


Ci fr ( i , )(n i )V (x, i ),

(5.7)

i=1

where N is the number of generated light sources, i is the direction of


the generated light source, and Ci is the corresponding color. Figure 5.7
and Figure 5.8 show an example.
The light source generation methods will result in less noisy images, so
they are ideal for animated scenes, where the geometry and camera may be
dynamic but the environment map is a still image. However, the method
can present aliasing only if a few light sources are generated, depending on
the radiance distribution and the dynamic range of the environment map.

5.2. Rendering with IBL

163

(a)

(b)

Figure 5.8. An example of evaluation of Equation (5.7) using MCS [54] with
dierent N . (a) N = 16. Note that aliasing artifacts can be noticed. (b) N = 256;
aliasing is alleviated.

Figure 5.8 shows an example of aliasing artifacts caused by the limited


number of generated lights.

5.2.2 Monte-Carlo Integration and Importance Sampling


Another popular method for IBL is to use Monte-Carlo integration. This
uses random sampling for evaluating complex multidimensional integrals,
as in the case of Equation (5.6). As an example, a one-dimensional function,
f (x), to be integrated over the domain [a, b] is usually solved as


f (x)dx = F (a) F (B),

Iab =

F  (x) = f (x).

However, it may not be possible to integrate F (x) analytically as is the


case for a normal distribution or if f (x) is known only in few points of
the domain. In Monte-Carlo integration [175], integration is calculated by
averaging the value of f (x) in N points distributed over a domain, assuming
Riemann integrals:
ba 
f (xi ),
Iab =
N i=1
N

ba
f (xi ),
N + N
i=1
N

Iab =

lim

(5.8)

where x1 , x2 , ..., xN are random uniformly distributed points in [a, b]. This
is because deterministic chosen points [175] do not work eciently in the
case of multidimensional integrals. Hence, to integrate a multidimensional
function equidistant point grids are needed, which are very large (N d ).
Here N is the number of points for a dimension and d is the number of
dimensions of f (x).
The convergence in the Monte-Carlo integration (Equation (5.8)) is de1
termined by variance, N 2 , which means that N has to be squared

164

5. Image-Based Lighting

to half the error. A technique that reduces variance is called importance


sampling. Importance sampling solves the integral by taking points xi that
contribute more to the nal result. This is achieved by using a probability
density function p(x) with a corresponding shape to f (x):
N
1  f (xi )

.
Iab =
N i=1 p(xi )

(a)

(b)

(c)

(d)

Figure 5.9. A comparison between Monte-Carlo integration methods for IBL.


(a) Monte-Carlo integration using 16 samples per pixel. (b) Importance sampling
Monte-Carlo integration using Pharr and Humphreys importance sampling with
16 samples per pixel. (c) Monte-Carlo integration using 128 samples per pixel.
(d) Importance sampling Monte-Carlo integration using Pharr and Humphreys
importance sampling with 128 samples per pixel. (The three-dimensional model
of Nettuno is courtesy of the VCG Laboratory ISTI-CNR.)

5.2. Rendering with IBL

(a)

165

(b)

Figure 5.10. Pharr and Humphreys importance sampling for IBL. (a) The environment map. (b) A visualization of a set of chosen set of 128 samples.

Note that the variance is still the same, but a good choice of p(x) can
make it arbitrarily low. The optimal case is when p(x) = fI(x)
. To create
ab
samples, xi , according to p(x) the inversion method can be applied. This
method calculates the cumulative distribution function P (x) of p(x); then
samples, xi , are generated by xi = P 1 (yi ) where yi [0, 1] is a uniformly
distributed random number.
Importance sampling can be straightforwardly applied to the IBL problem, extending the problem to more than one dimension [174]. Good choices
of p(x) are the luminance of the environment map image, l(  ), or the
BRDF, fr (,  ), or a combination of both. An example of the evaluation
of IBL using Monte-Carlo integration is shown in Figure 5.9. Monte-Carlo
methods are unbiased. They converge to the real value of the integral, but
they have the disadvantage of noise, which can be alleviated with importance sampling.
Listing 5.7, which may be found in the ImportanceSampling.m function
under the IBL folder, provides the Matlab code for Pharr and Humphreys
importance sampling method [174] that uses the luminance values of the
environment map for importance sampling. This method creates a cumulative distribution function (CDF) based on the luminance (computed in
L) of each of the columns and a CDF based on each of these columns over
the rows for the input environment map img. The code demonstrates the
construction of the row and column CDFs stored in rdistr and cdistr,
respectively. The generation of nSamples subsequently follows. For each
sample, two random numbers are generated and used to obtain a column
and row, eectively with a higher probability of sampling areas of high
luminance. The code outputs both the samples and a map visualizing
where the samples are placed in imgOut. It is important to note that
within a typical rendering environment, such as Pharr and Humphreys
physically-based renderer [174], the creation of the CDFs is computed once

166

5. Image-Based Lighting

before the rendering phase. In the rendering phase, a number of samples


to the environment is generated whenever shading via the environment is
required. Our Matlab code would only apply to one of these shading
points. The results of running Pharr and Humphreys importance sampling for the environment map in Figure 5.10(a) for 128 samples can be
seen in Figure 5.10(b).
function [ imgOut , samples ]= I m p o r t a n c e S a m p l i n g ( img , falloff ,
nSamples )
if ( falloff )
img = F a l l O f f E n v M a p ( img ) ;
end
% Luminance channel
L = lum ( img ) ;
[r , c ]= size ( L ) ;
% Creation of 1 D d i s t r i b u t i o n s for sampling
cDistr =[];
values = zeros (c ,1) ;
for i =1: c
%1D Distribution
tmpDistr = C r e a t e 1 D D i s t r i b u t i o n ( L (: , i ) ) ;
cDistr =[ cDistr , tmpDistr ];
values ( i ) = tmpDistr . maxCDF ;
end
rDistr = C r e a t e 1 D D i s t r i b u t i o n ( values ) ;
% Sampling
samples = [];
imgOut = zeros ( size ( L ) ) ;
pi22 = 2* pi ^2;
for i =1: nSamples
% random values in [0 ,1]
u = rand (2 ,1) ;
% sampling the rDistr
[ val1 , pdf1 ]= S a m p l i n g 1 D D i s t r i b u t i o n ( rDistr , u (1) ) ;
% sampling the cDistr
[ val2 , pdf2 ]= S a m p l i n g 1 D D i s t r i b u t i o n ( cDistr ( val1 ) ,u (2) ) ;
phi = pi *2* val1 / c ;
theta = pi * val2 / r ;
vec = PolarVec3 ( theta , phi ) ;
pdf = ( pdf1 * pdf2 ) /( pi22 * abs ( sin ( theta ) ) ) ;
samples = [ samples , struct ( dir , vec , color , img ( val2 , val1
,:) , pdf , pdf ) ];
imgOut ( val2 , val1 ) = imgOut ( val2 , val1 ) +1;

5.2. Rendering with IBL

167

end
end

Listing 5.7. Matlab Code: Importance sampling of the hemisphere using the
Pharr and Humphreys method.

The importance sampling method of Pharr and Humphreys, and other


methods that exclusively sample the environment map, may not be ideal
when computing illumination for specular surfaces, as the chosen samples
are independent of the BRDF and the contribution in the chosen directions
may not be ideal. Similarly, importance sampling of only the BRDF may
result in signicant contributions of the incident lighting being overlooked.
Ideally, all terms of the rendering equation are considered. Multiple importance sampling [209] was the rst technique to introduce sampling of more
than a single term to computer graphics. It presented a generic method for
importance sampling of multiple terms. Other methods have been more
specic at importance sampling in the case of IBL. Burke et al. [28] presented bidirectional importance sampling (BIS). BIS used rejection sampling and sampling importance resampling (SIR) to obtain samples from
the product of the environment map and the BRDF. Rejection sampling
requires an unknown number of retries to generate the product samples.
SIR does not require a number of retries so the number of samples generated can be bounded. SIR was concurrently presented by Talbot et al. [198]
in the context of IBL to importance sample the product of the BRDF and
the environment map, where an initial set of samples drawn from one of
the distributions is subsequently weighted and resampled to account for
the second term. If both the lighting and BRDF have high frequencies,
SIR may not be ideal as it becomes dicult to obtain samples that are
representative.
Wavelet importance sampling [41] also performed importance sampling
of the BRDF and the luminance in the environment map by storing the
lighting and BRDF as sparse Haar Wavelets. This method uses precomputation for computing the wavelets and may require considerable memory for
anything but low resolution lighting. This work was further extended [40]
to remove such limitations by sampling the BRDF in real time and building a hierarchical representation of the BRDF. This allows the support of
arbitrary BRDFs and complex materials such as procedural shaders. The
product with lighting is computed by multiplying the lighting, represented
as a mip-map with the BRDF hierarchy, enabling much higher resolution
environment maps. Results showed how this method compared favorably
with previously discussed methods.
While these methods all account for two of the terms in the rendering
equation, occlusion represented by V in Equation (5.6) is not taken into

168

5. Image-Based Lighting

account. Clarberg and Akenine-Moller [39] used control variates to reduce


the variance. This is done by approximating the occlusion using a visibility
cache that provides a quick approximation of the lighting, which is in turn
used to reduce variance.

5.2.3 PRT for Interactive IBL


A series of methods have been adopted to be able to use IBL for interactive
rendering. As mentioned earlier, environment maps have been used for rendering diuse surfaces by ltering the environment maps. Ramamoorthi
and Hanrahan [178] introduced a method to eciently store an irradiance
environment map representation by projecting them onto a basis function.
At runtime the irradiance can be computed by evaluating this representation. Ramamoorthi and Hanrahan used spherical harmonic polynomials
without needing to access a convolved environment map. This method did
not take occlusion and interreections into account but served to inspire
a series of techniques that did; these were termed precomputed radiance
transfer (PRT) [191] techniques. PRT, as the name implies, requires a precomputation stage that computes the lighting and the transfer components
of rendering and then can compute the illumination in real time. These
methods are suitable for interactive applications and have been adopted by
the games industry, since they are very fast to compute. This computation
essentially requires the computation of dot products once the precomputation stage, which may be rather expensive, is nalized.
Assuming only diuse surfaces entailing the BRDF, is dependent only
on x, and we can adjust the modied rendering equation we used for IBL
previously while ignoring Le . Equation (5.6) then becomes


L( )V (x,  )(n  )d  .


L(x, ) =

PRT projects the lighting (L) and transfer functions (the rest of the
integral) onto an orthonormal basis. L( ) can be approximated as
L( )

lk yk (  ),

where yk are the basis functions. In this case we assume spherical harmonics
as used by the original PRT method, and lk are the lighting coecients
computed as

lk =

L(  )yk (  )d  .

In this instance lk is computed using Monte-Carlo integration, similar


to the method described in the previous section. The transfer functions

5.2. Rendering with IBL

169

are similarly evaluated using Monte-Carlo integration based on




yk (  )V (x,  )(n  )d  ,
tk =

(5.9)

where, in this case, the computation may be rather expensive as V (x,  )


would need to be evaluated via ray casting.
The computation of the lighting coecients, lk , and transfer coecients,
tk , represents the precomputation aspect of PRT. Once these are computed,
the lighting can be evaluated as

l k tk ,
(5.10)
L(x, ) =
k

which is a straightforward dot product that current graphics hardware are


ideal at computing interactively.
The example above would only compute direct lighting from the environment, ignoring the light transport from secondary bounces required
for global illumination. This can be added by considering the Neumann
expansion:
L(x, ) = Le (x, ) + L1 (x, ) + L2 (x, ) . . . ,
where Le is the light emitted and Li is the lighting at x at the ith bounce.
This leads to a global illumination version of Equation (5.10):

L(x, ) =
lk (t0k + t1k + . . .),
k

where t0k is equivalent to tk calculated in Equation (5.9), and ti represents


the transfer coecients at the ith bounce. See Green [75] for further details.
The PRT described so far is limited to diuse surfaces and low frequency
lighting eects. Since the original publication, a number of researchers have
sought to eliminate the limitations of the original method. Ng et al. [157]
used wavelets instead of spherical harmonics to include high frequency effects. Zhou et al. [242] compute dynamic shadows. Ng et al. [158] generalize
the approach for all-frequency illumination for arbitrary BRDFs by decoupling visibility and materials, thus using three coecients. Haar wavelets
were used as basis functions. Further extensions have allowed light interreections for dynamic scenes [93, 162].

5.2.4 Rendering with More Dimensions


Adelson and Bergen [4] described the amount of radiance based on the
seven-dimensional plenoptic function:
P (x, y, z, , , , t),

170

5. Image-Based Lighting

where (x, y, z) denotes the three-dimensional location at which the incident lighting is captured, (, ) describe the direction, describes the
wavelength of the light, and t the time.
The IBL we have demonstrated until now in this chapter xes (x, y, z)
and t and would usually use three values for (red, green, and blue). Effectively, we have been working with P (, ) for red, green, and blue. This
entails that lighting is based on a single point, innitely distant illumination, at one point in time, and it cannot capture lighting eects such as
shadows, caustics, and shafts of light. Recently, research has begun to look
into the IBL methods that take into account (x, y, z) and t.
Spatially varying IBL. Sato et al. [188] made use of two omnidirectional cameras to capture two environment maps corresponding to two spatial variations in the plenoptic function. They used stereo feature matching to construct a measured radiance distribution in the form of a triangular mesh,
the vertices of which represent a light source. Similarly, Corsini et al. [43]
proposed to capture two environment maps for each scene and to solve for
spherical stereo [120]. In this case, the more traditional method of using
two steel balls instead of omnidirectional cameras was used. When the
geometry of the scene is extracted, omnidirectional light sources are generated for use in the three-dimensional scene. The omnidirectional light
sources make this representation more amicable to modern many-light rendering methods, such as light cuts [214]. Figure 5.11 shows an example of
Corsini et al.s method.
Unger et al. [206] also calculated spatial variations in the plenoptic
function. Their method, at the capture stage, densely generated a series of

(a)

(b)

Figure 5.11. An example of stereo IBL by Corsini et al. [43] using the VCG Laboratory ISTI-CNRs Laurana model. (a) The photograph of the original model.
(b) The relighted three-dimensional model of (a) using the stereo environment
map technique. Note that local shadowing is preserved as in the photograph.
(Images are courtesy of Massimiliano Corsini.)

5.2. Rendering with IBL

171

Figure 5.12. An example of the dense sampling method by Unger et al. [206,
207], the synthetic objects on the table are lighted using around 50,000 HDR
environment maps at Link
oping Castle, Sweden. (Image is courtesy of Jonas
Unger.)

environment maps to create what they term an incident light eld (ILF),
after the light elds presented by Levoy and Hanrahan [118]. Unger et
al. presented two capture methods. The rst involved an array of mirror
spheres and capturing the lighting incident on all these spheres. The second
device consisted of a camera mounted onto a translational stage that would
capture lighting at uniform positions along the stage. The captured ILF
is then used for calculating the lighting inside a conventional ray tracingbased renderer. Whenever a ray hits the auxiliary geometry (typically a
hemisphere) representing the location of the light eld, the ray samples
the ILF and bilinearly interpolates directionally and spatially between the
corresponding captured environment maps. Unger et al. [207], subsequently
extended this work, which took an infeasibly long time to capture the
lighting, by using the HDR video camera [205] described in Section 2.1.2.
This method allowed the camera to roam freely, with the spatial location
being maintained via motion tracking. The generated ILF consisted of a
volume of thousands of light probes, and the authors presented methods for
data reduction and editing. Monte-Carlo rendering techniques [174] were
used for fast rendering of the ILF. Figure 5.12 shows an example when
using this method.

172

5. Image-Based Lighting

Temporally varying IBL. As HDR video becomes more widespread, a number of methods that support lighting for IBL from dynamic environment
maps, eectively corresponding to the change of t in the plenoptic function,
have been developed. These methods take advantage of temporal coherence rather than recomputing the samples each frame, which may result in
temporal noise.
Havran et al. [82] extended the static environment map importance
sampling from their previous work [81] to be applicable in the temporal
domain. Their method uses temporal lters to lter the power of the lights
at each frame and the movement of the lights across frames. Wan et al. [215]
introduced the spherical Q2 -tree, a hierarchical data structure that subdivides the environment map into equal quadrilaterals proportional to solid
angles in the environment map. For static environment maps, the Q2 -tree
creates a set of point lights based on the importance of the environment
map in that area, similar to the light source generation methods presented
in Section 5.2.1. When computing illumination due to a dynamic environment map, the given frames Q2 -tree is constructed from that of the
previous frame. The luminance of the current frame is inserted onto the
Q2 -tree, which may result in inconsistencies since the Q2 -tree is based on
the previous frame, so a number of merge and split operations update the
Q2 -tree until the process converges to that of a newly built Q2 -tree. However, to maintain coherence amongst frames and avoid temporal noise, the
process can be terminated earlier based on a tolerance threshold.
Ghosh et al. [72] presented a method for sampling dynamically changing environment maps by extending the BIS method [28] (see Section 5.2.2)
into the temporal domain. This method supports product sampling of environment map luminance and BRDF over time. Sequential Monte-Carlo
(SMC) was used for changing the weights of the samples of a distribution during consecutive frames. BIS was used for sampling in the initial
frames. Resampling was used to reduce variance as the increase in number
of frames could result in degeneration of the approximation. Furthermore,
Metropolis-Hastings sampling was used for mutating the samples between
frames to reduce variance. In the presented implementation, the samples
were linked to a given pixel, and SMC was applied after each frame based
on the previous pixels samples. When the camera moved, the pixel samples were obtained by reprojecting the previous pixels locations. Pixels
without previous samples were computed using BIS.
Virtual relighting. The plenoptic function considers only the capture of xed
lighting at dierent positions, times, orientations, and wavelengths. If we
want to be able to have the ability of changing both the viewpoint and
the lighting we would need to consider a further number of factors, such as
the location, orientation, wavelength, and timing of the light. Debevec [55]

5.2. Rendering with IBL

(a)

173

(b)

Figure 5.13. An example of light stage. (a) A sample of six images from a
database of captured light directions. (b) The relight scene captured in (a) using
an environment map. (The Grace Cathedral environment map is courtesy of Paul
Debevec.)

considers this the reectance eld R, which is a 14-dimensional function


accounting for both an incident ray of light Li and the plenoptic function
for radiant light Lr and given by
R = R(Li ; Lr ) = R(xi , yi , zi , i , i , i , ti ; xr , yr , zr , r , r , r , tr ),
where each term is equivalent for that in the plenoptic function for both the
light and the view. When considering applications that make use of R, a
number of approximations need to be taken into account. One popular application is the light stage and its various successors [52] (see Figure 5.13).
These provide the ability of virtually relighting actors with arbitrary lighting after their performance is captured. The light stage captures the performance of an actor inside a rig surrounded by a multitude of lights.
The light rig lights the actors face from each of the individual lights
while the camera captures the actors expression. The video capture accounts for r and r and the light rig for i and i . The view and light are
considered static so (xr , yr , zr ) and (xi , yi , zi ) are constant. The time for the
light taken to reach the actors face is considered instantaneous, eliminating
ti . Similarly, the wavelength of light is not considered to be changing, removing i , and xing it to the three red, green, and blue capture channels.
i , i , r , r , r , tr ).
The reectance eld can thus be approximated to R(
The lights represent a basis function and are subsequently used to relight
the actors face. The additive properties of light mean that subsequently
the light values can be scaled by the contribution of the light representing
that position in an environment map, giving the impression that the actor
is lit from the environment map.

174

5.3

5. Image-Based Lighting

Summary

The widespread use of HDR has brought to the forefront IBL as one of
its major applications. IBL has rapidly emerged as one of the most studied rendering methods and is now integrated in most rendering systems.
The various techniques used at dierent ends of the computation spectrum
have allowed simple methods such as environment mapping to be used
extensively in games, while more advanced interactive methods such as
PRT and its extensions begin to gain a strong following in such interactive environments. More advanced methods have been used and continue
to be used in cinema and serious applications, including architecture and
archaeology. As the potential of capturing more aspects of the plenoptic
function (and indeed reectance elds) increases, the ability to relight virtual scenes with real lighting will create many more possibilities and future
applications.

6
Evaluation

As we have shown in Chapter 3 and Chapter 4, many techniques have


been proposed for tone mapping and luminance expansion. With such a
large number of techniques, it is useful to understand the relative merits of
each. As a consequence, several methodologies have now been put forward
that evaluate and compare the variety of approaches. Evaluation of the
techniques provides a better understanding of the relationship between an
operator and the image attributes. This can help in the development of operators more suited to a particular applicationfor example, an improved
method for highlighting those parts of an image that are perceptually important to the HVS of a colorblind viewer. Evaluation methods may be
classied as:
Psychophysical experiments. In this case (large) user studies are
performed to visually compare images. This typically involves comparing a ground truth image with the output result of a particular
technique.
Error metrics. Computer metrics are used to compare images. These
can be simple computational dierences or metrics that simulate the
behavior of the HVS to nd perceptual dierences between the images
that would be perceived by the HVS.

6.1

Psychophysical Experiments

The setup of a psychophysical experiment is slightly dierent if it is used to


compare TMO techniques with the one used to compare expansion methods. In both cases, in order to avoid interference by other stimuli, the

175

176

6. Evaluation

HDR Monitor
LDR Display

TM

O2

HDR
Reference

LDR Display

TM

O1

80cm

45 o

45 o

(a)

(b)

Figure 6.1. An example of the setup for the evaluation of TMOs using an HDR
monitor as reference. (a) The diagram. (b) A photograph. (The photograph is
courtesy of Patrick Ledda [114].)

experiments are performed in a dark room with full control of the lighting
conditions. A typical setup, in the case of TMO evaluation, is shown in
Figure 6.1 where an HDR display is used as the reference and two LDR
displays are used to show the TMO outputs. Real scenes have also been
used as the reference. Where it is not possible to use a real scene or the
HDR image of that scene, the tone mapped images are displayed on LDR
screens and compared with each other. A setup for the evaluation of expansion methods is shown in Figure 6.2. Here a single HDR display is used
and three images (the results of two expansion methods and the reference
HDR image) are displayed on it side by side. Typically the HDR reference image is shown in the center of the display and the results of the two
expansion methods to be compared are on either side of the reference.

Expansion
Method 1

(a)

REFERENCE

Expansion
Method 2

(b)

Figure 6.2. An example setup for the evaluation of expansion methods using an
HDR monitor. (a) The diagram. (b) A photograph.

6.1. Psychophysical Experiments

177

Independently of which type of technique is compared, participants


should be chosen with normal or corrected-to-normal vision and carefully
instructed about the task that has to be performed. The participant usually performs the task by interacting with a computer interface that collects
data. In order to avoid any biasing of the judgment of the observers, an
average gray image is shown between one stimulus observation (image) and
the next one. There are three main kinds of psychophysical experiments
that have been used in the evaluation of TMOs:
Ranking. Each participant has to rank a series of stimuli based on a
criterion. In this case results are more accurate and precise because
a human being has to make a rm decision. The main disadvantage
is that the task is time consuming because a participant has to compare all images. Ranking can be performed indirectly using pairwise
comparisons.
Rating. A participant has to rate an attribute of a stimulus on a
scale. A reference or another stimulus can be included. This method
is very fast to perform because all other stimuli are not needed. However, participants can have dierent perceptions of the rating scale
so collected data can be less precise than ranking experiments.
Pairwise comparisons. A participant needs to determine which image in a pair is closer to a reference or which image in a pair better
satises a certain property. The method produces data with fewer
subjective problems, such as rating, and there are standard statistical methods to determine the signicance of inferred ranks, unlike
ranking. Nevertheless, the method usually requires more time than
rating and ranking. Each participant needs to evaluate n(t 1)t/2
pairs where t is the number of TMOs and n is the number of images.
This technique has been also used for comparing expansion methods
(see Section 6.1.10).

6.1.1 Perceptual Evaluation of Tone Mapping Operators with


Regard to Similarity and Preference
One of the rst studies in computer graphics on tone mapping evaluation
was conducted by Drago et al. [58], where the main goal was to measure the
performances of TMOs applied to dierent scenes. The study consisted of
two parts. The rst one was a psychophysical experiment in which participants answered subjective questions. The second was a multidimensional
statistical analysis inspired by Pellacini et al.s work [169]. Seven TMOs
were tested: LCIS [203], revised Tumblin-Rushmeier [204], photographic

178

6. Evaluation

tone reproduction [180], uniform rational quantization [189], histogram adjustment [110], Retinex [177], and visual adaptation [68].
The results of the analysis showed that the photographic tone reproduction operator tone mapped images closest to the ideal point extracted from
the participants preferences. Moreover, this operator, the uniform rational quantization, and Retinex methods were in the same group for betterlooking images. This is probably due to the their global contrast reduction
operators, which share many common aspects. While visual adaptation
and revised Tumblin-Rushmeier were in the second group, the histogram
adjustment was between the two groups.
The study presented a methodology for measuring the performance of
a TMO using subjective data. However, the main problem was the number
of people who took part (11 participants) and the size of the data-set (four
images), which were too small for drawing signicant conclusions. Drago
et al. [59] used these ndings to design a Retinex TMO; however, they
subsequently failed to evaluate that it did indeed reach the desired quality.

6.1.2 Evaluating HDR Rendering Algorithms


Kuang et al. conducted two studies [105, 107] on the evaluation of TMOs.
The rst one [105] showed the correlation in ranking between colored and
grayscale images. More interesting was the second study, which extended
the rst one by introducing a methodology for testing TMOs for overall
preference and accuracy. This study was divided into three experiments
and involved 33 participants. The rst experiments goal was to determine
participants preferences amongst TMOs. The second one was designed
to measure attributes that help to choose a TMO among others. Finally,
the third experiment attempted to measure the accuracy of a TMO at
reproducing real-world scenes. Six TMOs were tested: histogram adjustment [110], Braun and Fairchilds sigmoid transform for color gamut [27],
Funt et al.s Retinex [70], iCAM 2002 [63], a modied fast bilateral ltering [62], and photographic tone reproduction [180].
The results of the rst experiments showed that colored and grayscale
images were correlated in the ranking. The second experiment highlighted
that overall preference was correlated with details in dark areas, overall
contrast, sharpness, and colorfulness. Moreover, the overall appearance
can be predicted by a single attribute. Finally, the third one showed that
data of the rst part (preferences) and data of the second part (accuracy
with the real world) were correlated. Therefore, the authors suggested that
both methodologies can be utilized as an evaluation paradigm.
In all experiments the modied fast bilateral ltering performed well,
suggesting that it is a good algorithm for tone mapping.

6.1. Psychophysical Experiments

179

6.1.3 Paired Comparisons of Tone Mapping Operators Using an


HDR Monitor
One of the rst studies using an HDR reference was presented by Ledda
et al. [114], where an HDR monitor was employed as the ground truth.
This study used a pairwise comparison methodology with a reference [49].
Forty-eight participants took part in the experiments, and 23 images were
tone mapped using six TMOs: histogram adjustment operator [110], fast
bilateral ltering operator [62], photographic tone reproduction operator
[180], iCAM 2002 [63], adaptive logarithmic mapping [60], and local eye
adaptation operator [113].
The collected data was analyzed using accumulated pairwise preference scores [49] in combination with coecients of agreement and consistency [97]. The study showed that, on the whole, the photographic tone
reproduction and iCAM 2002 performed better than the other operators,
but in the case of grayscale images, the rst operator was superior. This is
due to the fact the iCAM processes primarily colors, and it is disadvantaged
in a grayscale context. Moreover, iCAM 2002 and the local eye adaptation
operator performed very well for detail reproduction in bright areas. In
conclusion, iCAM 2002 performed generally better than other operators,
because colors are an important stimuli in human vision. Furthermore, the
study highlighted low performances for the fast bilateral ltering operator.
This may be due to the fact that high frequencies are exaggerated with this
method when compared to the reference.
This study presented a robust methodology, involving a large number
of participants, and a large data set was used covering varieties of scene
categories such as day, night, outdoor, indoor, synthesized images, and
captured images from the real world.

6.1.4 Testing TMOs with Human-Perceived Reality


A dierent evaluation approach was proposed by Yoshida et al. [239, 241].
They ran a series of subjective psychophysical experiments using, as reference, a real-world scene. The main goal of this study was to assess the
dierences in the perception of dierent TMOs against the real world. This
was achieved by measuring some attributes, including image naturalness,
overall contrast and brightness, and detail reproduction, in dark and bright
areas of the image. Fourteen participants took part, and seven TMOs were
tested: linear mapping, histogram adjustment [110], time dependent visual
adaptation [168], Ashikhmin operator [17], fast bilateral ltering [62], photographic tone reproduction [180], and adaptive logarithmic mapping [60].
The rst nding was that there was no statistical dierence between
the perception of the two scenes. The second was that the perception of

180

6. Evaluation

naturalness seems to have no relationship with other attributes. While


global TMOs performed better than local ones in terms of overall brightness and contrast, local TMOs performed better for detail reproduction
in bright areas. However, Ashikmins operator and adaptive logarithmic
mapping performed better than others for detail reproduction in dark areas. Finally, the most natural operators were the photographic tone reproduction, histogram adjustment, and adaptive logarithm. The authors also
compared these results with Ledda et al.s study [114] and Kuang et al.s
one [105, 107], noticing that their study shared some similar results with
this previous work.
This study analyzed ve attributes of an image. The analysis determined which TMO can perform better than others for a given task, such as
reproduction of contrast, brightness, detail in bright and dark regions, and
naturalness. The novelty of the work was to compare the image directly
with a real scene, although this work was limited in that only two indoor
scenes with articial illumination were used.

6.1.5 A Reality Check for Tone Mapping Operators


Another study on TMO evaluation by Ashikhmin and Goyal [16] was based
on ranking with real scenes as references. The study explored how people
perceive real scenes when compared to TMOs, the realism of TMOs, and
personal preference of them. Fifteen participants took part in the experiments, and ve TMOs were tested: histogram adjustment [110], gradient
domain compression [67], photographic tone reproduction [180], adaptive
logarithmic mapping [60], and trilateral ltering [36].
The study showed that the dierence between experiments with and
k et al. [30, 31]
without the reference is large, in disagreement with Cad
(see Section 6.1.9). The main problem of Ashikhmin and Goyals study,
similar to [239], which used a real scene as a reference, is the limited number of scenes that could be considered. These were indoor with articial
illumination. This is due to the problem of controlling the scene in an
outdoor environment for the duration of the experiment. This does mean
that it is not possible to generalize these results to all lighting conditions
and scenes. Furthermore, the rst experiment in this study asked for a
personal preference, which is very subjective.

6.1.6 Perceptual Evaluation of Tone Mapping Operators Using the


Cornsweet-Craik-OBrien Illusion
Aky
uz and Reinhard [13] conducted a study in which contrast in tone
mapped images was isolated and evaluated for a better understanding of

6.1. Psychophysical Experiments

(a)

181

(b)

Figure 6.3. The Cornsweet-Craik-OBrien illusion used in Aky


uz and Reinhards
study [13]. (a) The prole needed to generate the illusion. (b) The stimulus
image used in the experiment. The red lines indicate the scanlines evaluated in
the experiment. The stimulus image is courtesy of Ahmet O
guz Aky
uz [13].

the attribute, arguing that since the HVS is very sensitive to contrast, one
of the main goals of a tone mapping operator should be to preserve it. This
was achieved using synthesized stimuli. In particular they chose to evaluate
the Cornsweet-Craik-OBrien illusion [42]. This is characterized by a ramp
(see Figure 6.3(a)), between two at regions that increases the perceived
contrast as
Lmax Lmin
C=
,
Lmax + Lmin
where Lmax and Lmin are respectively the maximum and minimum luminance value of the ramp.
Thirteen participants took part in the experiment and seven TMOs
were tested: histogram adjustment [110], revised Tumblin-Rushmeier [204],
gradient domain compression [67], fast bilateral ltering [62], photographic
tone reproduction [180], and iCAM 2002 [63].
The results of the experiment showed that the tone mapping operators
preserve the Cornsweet illusion in an HDR image in dierent ways, either
by accentuating or making it less pronounced. The authors also noticed
that the strength of the Cornsweet illusion is altered dierently for dierent
regions, and this is due to the dierent way an operator is working. For
local operators, this is generated by the so-called gradient reversal. For

182

6. Evaluation

global operators, this is due to the shape of the compression curve that
eects their consistency for dierent regions of the input image.
A new methodology of comparisons was presented without the need of
a true HDR reference; only a slice of information was judged at a time. In
fact, each scanline was LDR. The study focused on contrast reproduction,
assessing that TMOs do not preserve the Cornsweet illusion in the same
way. While some TMOs decrease the illusion because gradients are attenuated, others exaggerate the illusion by making them more pronounced.

6.1.7 Evaluating Tone Mapping Algorithms for Rendering


Nonpictorial (Scientific) HDR Images
Park and Montag [165] presented two paired-comparison studies to analyze
the eect of TMOs on nonpictorial images. As dened by the authors, a
nonpictorial image is an image captured outside of the visible wavelength
region, such as hyperspectral data and astronomical and medical images.
The authors applied on four nonpictorial images the following nine TMOs:
linear mapping, spiral rendering (curved color path), sigmoid-lightness
rescaling, Localized sigmoid mapping, Photoshop tool auto-levels, iCAM
2002 [63] appearance model, fast bilateral ltering operator [62], and photographic tone reproduction operator [180]. The paired-comparison experiments were conducted without using an HDR monitor as reference. The
two images processed with two dierent TMOs were compared side by
side on a 23 Apple Cinema HD at-panel LCD. Three paired-comparison
experiments were conducted. In the rst one, the goal was to judge the
observers preference to determine which TMO outputs the HDR images in
a more preferable way (to judge the overall quality of the image). In this
case, the task of the observers was to choose the image they prefer in each
pair. In the second experiment the scientic usefulness of the images was
judged. The observers were required to choose the image in each pair that
they considered more scientically useful. Furthermore, the second experiment was repeated online with expert observers of that particular image
type. The participants were asked to choose the image from each pair that
would be more useful based on their expertise. A third experiment was performed to evaluate the eect of the TMO techniques on target detection,
such as a tumor in medical images. The target was treated as noise and
generated as normally-distributed random noise and then multiplied with a
Gaussian lter to reduce the sharpness of the noise. The target was located
in the three tone areas where its size was inversely proportional to the tone
luminance (size): dark (large), mid (medium), and high tone (small). In
this last experiment the images (with and without target and processed
with the same TMO) were displayed side by side, where the observers task
was to choose the image with the target.

6.1. Psychophysical Experiments

183

The nding of the two rst paired-comparison experiments was that


the photographic tone reproduction operator [180] had the best average
performance in both experiments. The overall nding was that the use of
a particular TMO is related to the image type. In other words, there is no
TMO that can be optimal in all image types. The online experiment, on
the scientic usefulness, showed that the photographic tone reproduction
operator [180] again performed well.
Despite the high number of observers who participated in both experiments, the number of images used was too limited. Just four images were
used and this is not enough to generalize the authors ndings. Furthermore, the results of the target detection experiment, which also used four
images and only had two observations, did not correspond with the results
of the previous two paired-comparison experiments. The authors pointed
out that when the goal is target detection, the spatial structure of a TMO
may aect the identication of the target in the tone mapped image. Although the number of images considered and the participants who took
part were so low that no meaningful conclusions can be drawn, the results
do further suggest that a TMO should be specic for the type of image
being used.

6.1.8 Analysis of Reproducing Real-World Appearance on


Displays of Varying Dynamic Range
In 2006, Yoshida et al. [240] presented a series of experiments where three
basic TMO parametersbrightness, contrast, and color saturationwere
manipulated by observers to identify the output tone characteristics that
produce better perceived images. The authors conducted two experiments
using an HDR display. In the rst experiment, participants were asked
to adjust an HDR image shown on the HDR display to make it look as
best they could. In the second experiment, a real-world scene was used
as reference and an HDR image shown on the HDR display was adjusted
to match the real-world reference image. The second experiment also included simulated display devices with dierent dynamic ranges that varied
the lowest and highest luminance output values of the HDR display. In
total there were 24 participants in the experiments, but only eight of them
did both. Twenty-ve HDR images (fourteen outdoor, and eight indoor
scenes) were used for the rst experiment and just three for the second
one. The main goal of this work was to identify what the important properties of a TMO are and use these for developing a better TMO. Due to
the time-consuming nature of the psychophysical experiments, the authors
decided to consider only a global TMO that involves linear scaling and
shifting of color values in the logarithmic domain. Such a TMO is able to
mimic several global TMOs. This generic TMO can be described by three

184

6. Evaluation

Figure 6.4. Relation between the most correlated variables and the TMO parameters as in [240].

parametersbrightness, contrast, and color saturationand is modeled as


log10 R = c log10 R + b,


(6.1)


log10 Y = 0.2126 log10 R + 0.7152 log10 G + 0.0722 log10 B ,


log10 R = log10 Y  + s(log10 R log10 Y  ),

(6.2)

where the parameters b, c, and s represent the brightness, contrast, and


color saturation, respectively. Y  is the new luminance value and R is the
modied red color channel of the display. Equation (6.1) and Equation (6.2)
are also applied to the green G and blue B color channels.
The outcome of these experiments can be summarized in three ndings:
A better understanding of how the observers adjusting the TMO
parameters helps to derive a better parametrization of a linear TMO.

6.1. Psychophysical Experiments

185

The authors claimed to have proven that it is possible to predict the


parameters estimation of the TMO, but due to the reduced number
of images used in the experiments, they were unable to build a reliable
model for such an estimation.
As can be seen in Figure 6.4, the highest correlation found was for
the contrast parameter.
Finally, this work provides some insights into how the dynamic range
and brightness of a display inuences the TMO parameters. For 14 simulated monitors, the authors did not nd any major dierence in the strategy
used by the observers to adjust images for LDR and HDR displays. However, as explained by the authors, this is task-related. When performing the
task with the goal of identifying the best-looking image, observers tended
to enhance the contrast, clipping large areas in the dark part of an image.
However, when the goal was to achieve the best delity with a real-world
scene, the observers avoided clipping in both dark and bright areas of the
image and the contrast was never increased much above the contrast of the
original image. Concerning the observers preference for displays of dierent capabilities, the outcome suggested that the observers prefer brighter
displays primarily and displays with low minimum luminance secondly.

6.1.9 Image Attributes and Quality for Evaluation of Tone Mapping


Operators
k et al. [30,31] presented a TMO evaluation study with subjective psyCad
chophysical experiments using as reference real scenes, similar to Yoshida
et al.s work [239, 241]. Both rating and ranking methodologies were employed. Furthermore, the collected data was tted into dierent metrics.
The main focus of the study was on some image and HVS attributes, including brightness or perceived luminance, contrast, reproduction of colors, and
detail reproduction in bright and dark areas of the image. Ten participants
took part in the two experiments, and 14 TMOs were employed: linear
mapping, LCIS [203], revised Tumblin-Rushmeier [204], photographic tone
reproduction [180], uniform rational quantization [189], histogram adjustment [110], fast bilateral ltering [62], trilateral lter [36], adaptive gain
control [166], Ashikhmin operator [17], gradient domain compression [67],
contrast-based factor [222], adaptive logarithmic mapping [60], and spatially nonuniform scaling [35].
The rst nding was that there is no statistical dierence between the
data of the two experiments, between rating and ranking. Therefore, the
authors suggested that, for a perceptual comparison of TMOs, ranking

186

6. Evaluation

without a reference is enough. The second nding was the good performance of global methods over local ones. This fact is in line with other
studies, such as Ledda et al. [114], Aky
uz et al. [14], Drago et al. [58], and
Yoshida et al. [239, 241]. In the last part of the study, the relationship between the overall image quality and the four attributes was analyzed and
tted into parametric models for generating image metrics.
The study measured performances of a large number of TMOs. Furthermore, four important attributes of an image were measured, and not
only the overall quality. However, the number of participants was small
and the choice of scenes was very limited and did not cover other common
real-world lighting conditions.

6.1.10 A Psychophysical Evaluation of Inverse Tone Mapping


Techniques
Banterle et al. [23] proposed a psychophysical study for the evaluation
of expansion algorithms based on pairwise comparisons methodology [49,
114] using an HDR reference image displayed on the Dolby DR-37p HDR
monitor [57]. The study involved 24 participants, and ve algorithms were
tested: Banterle et al. [19, 20] (B), Meylan et al. [145, 146] (M), Wang et
al. [218] (W), Rempel et al. [182] (R), and Aky
uz et al. [14] (A). The
study was divided into two experiments. The rst one tested performances
of various expansion algorithms for the recreation of eight HDR images
starting from clipped ones. A participant had to choose the best picture
in a pair that was closer to the reference: overall, in the dark areas, and
in the bright ones. The second experiment investigated which expansion
method performed best for recreating six HDR environment maps for IBL
for three dierent materials: pure diuse, pure specular, and glossy. Each
participant had to choose the closest relit object (a teapot) to a reference
relit object out of a pair of relit objects.
For the rst experiment, the monotonically increasing functions B, W,
and R that enhance contrast nonlinearly performed better overall and were
grouped together in many of the results. The linear method A, and to a
lesser extent M, performed worst overall, reecting that for still images
complex methods recreate HDR perceptually better.
For the second experiment, the diuse results showed few dierences.
This is mostly due to the fact that rendering with IBL consists of evaluating an integral and during this integration small details may be lost. This
is less true for perfectly mirror-like or high glossy materials. However, in
these cases, details of the environment map reected in the objects may be
too small to be seen, as was shown by the large groupings in the results.
For more complex environment maps, the previously found ranking was

6.2. Error Metric

187

reverted. Overall, the results clearly showed that the operators that perform best, as with the rst experiment, were the nonlinear operators.
This study showed that more advanced algorithms that cater for quantization errors introduced during expansion of an LDR image, such as B,
R, and W, can perform better than simple techniques that apply single or
multiple linear scale expansions, such as A and M. The more computationally expensive methods B, R, and W are better at recreating HDR than
simple methods. Even if a linear scale can elicit an HDR experience in an
observer, as shown in [14], it does not correctly reproduce the perception
of the original HDR image.

6.2

Error Metric

An error metric used to evaluate the similarities between images may use
dierent approaches depending on what needs to be achieved. If the goal is
to understand how two images are perceptually similar, then a simulation
of the HVS mechanisms may help to identify perceived dissimilarities or
similarities between the compared images. The main limitation of such an
error metric based on the simulation of the HVS mechanisms is that its
precision is dependent on how thoroughly the HVS has been simulated.
Despite vision scientists developing a deeper understanding of the HVS
over the last few decades, no error metric yet exists that fully simulates
the HVS. Rather, these error metrics only simulate some aspects of the
HVS. A typical example, used in the context of TMO comparison, is HDRVDP [134, 135]. This widely used metric works only on the luminance
channel without using any color information, which, of course, is a key
stimulus in human vision.

6.2.1 Predicting Visible Difference in HDR Images


The main goal of the visual dierence metric proposed by Mantiuk et
al. [134, 135] is to predict visible dierences in HDR images. The metric
is an extension of the existing visual dierence predictor (VDP) to HDR
imaging. VDP is a very popular metric for LDR images based on a model
of HVS. The ow chart for HDR-VDP is shown in Figure 6.5.
As with other error metrics, HDR-VDP takes as input the reference
and the tested images. This metric generates a probability map where
each value represents how strongly the dierence between the two images
may be perceived by the HVS (see Figure 6.6). The probability map can
be summarized in two values, NP (X=0.75) and NP (X=0.95) . The rst one
is the percentage of pixels that HVS can nd dierent with probability
0.75 and the second number is the percentage of pixels that HVS can nd

188

6. Evaluation

Figure 6.5. Flow chart of the HDR-VDP metric by Mantiuk et al. [134, 135].
(The original HDR image is courtesy of Paul Debevec.)

dierent with probability 0.95. HDR-VDP can also be used with LDR images. In this case, the images need to be inverted, gamma corrected and
calibrated according to the maximum luminance of the display where they
are visualized. In the case of HDR images, inverse gamma correction is
not required, but the luminance must be expressed in cd/m2 [134, 135].
The metric mainly simulates the contrast reduction in the HVS through
the simulation of light scattering in the cornea, lens, and retina (optical

(a)

(b)

(c)

(d)

Figure 6.6. An example of HDR-VDP. (a) The original HDR image. (b) A distortion pattern. (c) Image in (a) added to the distortion pattern (b). (d) The result
of HDR-VDP: gray areas have no perceptual error, green areas have medium
error, red areas have medium-high error, and purple areas have high error. Note
that single exposures images are visualized in (a) and (c) to show the dierences with the added pattern (b). (The original HDR image is courtesy of Paul
Debevec.)

6.2. Error Metric

189

transfer function [OTF]). This takes into account the nonlinear response
of our photoreceptors to light (just noticeable dierence [JND]). Because
the HVS is less sensitive to low and high spatial frequencies, the contrast
sensitivity function (CSF) is used to lter the input image. Afterwards the
image is decomposed into spatial and orientational channels and the perceived dierence is computed (using cortex transform and visual masking
blocks). The phase uncertainty step is responsible for removing the dependence of masking on the phase of the signal, and nally the probabilities of
visible dierences are summed up for all channels generating the dierence
probability map [134, 135].

6.2.2 Dynamic Range Independent Image Quality Assessment


Due to the diversity in the dynamic range of images, the available error metrics are simply not able to compare images with dierent dynamic range.
For example, it is not possible to compare an HDR image with an LDR
image.
Aydin et al. [18] presented a quality assessment metric suitable for comparing images with signicantly dierent dynamic ranges. This metric
implements a model of HVS that is able to detect only the visible contrast
changes. This information is then used to analyze any visible structure
changes. The metric is sensitive to three types of structural changes: loss
of visibility contrast, amplication of invisible contrast, and reversal of

Figure 6.7. Flow chart of the dynamic range independent quality assessment
metric of Aydin et al. [18].

190

6. Evaluation

2.2e+02

7.5e+01

2.5e+01

7.7e+00

2.0e+00

Lux

(a)

(b)

(c)

Figure 6.8. An example of the dynamic range independent quality assessment


metric example by Aydin et al. [18]. (a) The original HDR image. (b) A single exposure version of the image in (a). (c) The result of the dynamic range
independent quality assessment metric: gray means no perceptual error, green
means loss of visibility, blue means amplication of invisible contrast, and red
means reversal of visible contrast. (The original HDR image is courtesy of Paul
Debevec. (c) was generated by Tunc Aydin.)

visible contrast. Loss of visible contrast is typically generated by a TMO


that strongly compresses details, making them invisible in the tone mapped
image. Amplification of invisible contrast is the opposite of loss of visible
contrast. This is typical when artifacts appear in localized areas of the
test image that are not visible in the reference image. Reversal of visible
contrast occurs when the contrast is visible in both the reference and test
images but with dierent polarity [18].
Figure 6.7 shows the metrics ow chart. A contrast detection prediction step that simulates the mechanisms of the HVS is similar to the one
implemented in HDR-VDP. Subsequently, using the cortex transform [228],
as modied by Daly [48], the output of the contrast detection predictor is
subdivided into several bands with dierent orientations and spatial bandwidths. The conditional probabilities are used to estimate the three types
of distortions for each band. Finally, the three types of distortions are visualized with three dierent colors: green for the loss of visibility contrast,
blue for the amplication of invisible contrast, and red for the reversal
of visible contrast. Figure 6.8 shows an example of how the independent
quality assessment metric appears to the user.

6.3

Summary

The original motivation for developing a tone mapper was to display an


image on a screen that was perceptually equivalent to the real world (see

6.3. Summary

191

Figure 3.1). It was only many years later that images produced using
TMOs were actually compared with real-world scenes or reference scenes
using HDR displays. Not surprisingly, some TMOs were shown to be much
better at simulating the real world than others. Similarly, as the results
in this chapter show, certain expansion methods are able to create HDR
content from LDR images more accurately than others.
Error metrics oer a straightforward and objective means of comparing
images. To obtain the perceptual dierence between images as opposed
to a simple computational dierence requires the error metric to simulate
the HVS. Although there has been substantial progress in modeling the
HVS, the complexity of the HVS has yet to be fully understood. Reliance
on the results of current perceptual metrics should thus be treated with
caution.
Psychophysical experiments, on the other hand, use the real human
visual system to compare images. Although not limited by any restrictions
of a computer model, these experiments also have their problems. Firstly,
to provide meaningful results they should be run with a large number of
participants. There is no such thing as the normal HVS and thus only by
using large samples can any anomalies in participants HVSs be suciently
minimized. In addition, arranging the experiments is time-consuming and
a large number of other factors have to be controlled to avoid bias, such as
participant fatigue/boredom, the environment in which the experiment is
conducted, etc.
Finally, the evaluation of TMOs and expansion methods have only been
conducted on a limited number of images. We can, therefore, not yet say
with complete condence that any method will always be guaranteed to
produce perceptually better results than another. Indeed, the work presented in this chapter has already shown that, for example, some methods
perform better with darker images than brighter ones. Thorough and careful evaluation is a key part of any attempt to authentically simulate reality.
As our understanding of the HVS increases, so too will the computational
delity of computer metrics.

This page intentionally left blank

7
HDR Content Compression

The extra information within an HDR image means that the resultant
data les are large. Floating point representations, which were introduced
in Chapter 2, can achieve a reduction down to 32/24 bpp (i.e., RGBE
and LogLuv) from 96 bpp of an uncompressed HDR pixel. However, this
memory reduction is not enough and not practical for easily distributing
HDR content or storing large databases of images or video. For example,
a minute of a high denition movie (1920 1080) at 24 fps encoded using
24 bpp LogLuv requires more than 8.3 GB of space, which is nearly double
the space of a single layer DVD. Researchers have been working on more
sophisticated compression schemes in the last few years to make storing
of HDR content more practical. The main strategy has been to modify
and/or adapt current compression standards and techniques such as JPEG,
MPEG, and block truncation coding (BTC) to HDR content. This chapter
presents a review of the state-of-the-art of these compression schemes for
HDR images, textures, and videos.

7.1

HDR Compression MATLAB Framework

This chapter presents two compression algorithms for static images including Matlab code: JPEG-HDR (Section 7.2.1) and HDR-JPEG2000
(Section 7.2.2). Descriptions of HDR texture compression and HDR video
compression methods are also provided; however, there are no Matlab
implementations for these methods. Some methods need video or texture
codecs that can be dicult to set up in Matlab for all development platforms. Moreover, some methods need modications of the original standard, which would be quite impractical in Matlab without MEX les
in C++.

193

194

7. HDR Content Compression

The idea of our image compression framework is to have an encoder


function that reduces the range of an HDR image. This is then compressed
using a standard LDR image encoder through the imwrite.m function of
the Image Processing Toolbox (IPT) of Mathworks [141]. When the compressed image needs to be decompressed, a decoder function decompresses
it using the imread.m function of the IPT and then it expands its range
using additional information stored in an external text le.

7.2

HDR Image Compression

This section introduces the main techniques for HDR images compression.
Some of these concepts are used or extended to HDR texture and video
compression. The overarching method for HDR compression is to reduce
the dynamic range using tone mapping and to encode these images using standard encoding methods (see Figure 7.1). Subsequently, standard
decoding and expansion operators are used for decoding. Additional information is stored to enable this subsequent expansion of the tone mapped
images and to improve quality, including:
Tone mapping parameters. These are the parameters of the range
reduction function (which has an analytical inverse); they are needed
to expand the signal back.
Spatial inverse functions. These are the inverse tone mapping functions stored per pixel. These functions are obtained by dividing the
HDR luminance channel by the tone mapped one. When they vary
smoothly, depending on the TMO, they can be subsampled to increase eciency.

Figure 7.1. The general scheme for HDR image compression.

7.2. HDR Image Compression

195

Residuals. These are usually the dierences between the original


HDR values and the reconstructed encoded values after quantization. The values signicantly improve the quality of the nal image
because spatial quantization and bit reduction can introduce quantization errors. This can be noticed in the form of noise, enhancement
of blocking and ringing, banding artifacts, etc.
The main dierences between the various compression schemes are the
choice of the LDR encoder, the way in which the range reduction function
is calculated, and what data is stored to recover the full HDR image.

7.2.1 Backward Compatible JPEG-HDR


JPEG-HDR is an extension of the JPEG compression scheme to HDR
images by Ward and Simmons [219, 220]. The main idea is to tone map
HDR images and to encode them using JPEG. Additional information to
recover the compressed range is stored in a spatial function called ratio
image (RI).
The encoding (see Figure 7.2) starts with the tone mapping of the HDR
image. After this, the original HDR image is divided by the tone mapped
one obtaining the RI, which will be stored as a subband. The RI can be
down-sampled, reducing the subband size, because the HVS has a limited

Figure 7.2. The encoding pipeline for JPEG-HDR by Ward and Simmons [219,
220].

196

7. HDR Content Compression

ability to detect large and high frequency changes in luminance. This fact
was also exploited in Seetzen et al. [190] to improve the eciency of HDR
displays. However, down-sampling needs correction of the image because
the nave multiplication of a down-sampled image times the tone mapped
LDR image can produce halos/glare around the edges. This problem can be
solved in two ways: precorrection and postcorrection. The former method
introduces corrections in the tone mapped image. This is achieved by downsampling and afterward up-sampling the RI image, obtaining RId . Subsequently, the original HDR image is divided by RId , which is a tone mapped
image with corrections. The latter method consists of an up-sampling with
guidance, such as joint bilateral up-sampling [102], but it is more computationally expensive than the precorrection one. While RId is discretized at
8-bit in the logarithmic space and stored in application markers of JPEG,
the tone mapped layer needs further processing for preserving colors. Two
techniques are employed to solve this problem: compression of the gamut
and a new YCb Cr encoding. A global desaturation is performed for the
gamut compression. Given the following denition of saturation,
S(x) = 1

min(R(x), G(x), B(x))


,
Lw (x)

the desaturation of each color channel is achieved by

Rc (x)
Rc (x)
Lw (x)


Gc (x) = 1 S(x) Lw (x) + S(x) Gc (x) and S  (x) = S(x)1 ,
Bc (x)
Lw (x)
Bc (x)
(7.1)
where 1 controls the level of saturation kept during color encoding
and , which determines the color contrast. After this step, the image
is encoded in a modied YCb Cr color space because it has a larger gamut
than RGB color space. Therefore, unused YCb Cr values can be exploited to
preserve the original gamut of an HDR image. This is achieved by mapping
values according to the unused space. For the red channel, the mapping is
dened as

0.42

0.055
if Rc (x) > 0.0031308,
1.055Rc(x)

R (x) = 12.92Rc(x)
if |Rc (x)| 0.0031308,

1.055(Rc(x))0.42 + 0.055 if Rc (x) < 0.0031308.


This is repeated for the green and blue channel.
The decoding consists of a few steps; see Figure 7.3 for the complete
pipeline. Firstly, the tone mapped layer is decoded using a JPEG decoder
and the gamut is expanded by inverting Equation (7.1). After this step,
the RId image is decoded, expanded (from logarithmic domain to linear

7.2. HDR Image Compression

197

Figure 7.3. The decoding pipeline for JPEG-HDR by Ward and Simmons [219,
220].

domain), and up-sampled to the resolution of the tone mapped layer. Finally, the image is recovered by multiplying the tone mapped layer by the
RId image.
A study [219] was conducted to determine a good TMO for compression purposes. This was based on using VDP to compare the original
HDR images [48]. In this experiment, dierent TMOs were compared:
the histogram adjustment [110], the global photographic tone reproduction
operator [180], the fast bilateral ltering operator [62], and the gradient
operator [67]. Experiments showed that the fast bilateral ltering operator
performed the best, followed by the photographic tone reproduction one. A
second study was carried out to test image quality and compression rates
on a data set of 217 HDR images. The data set was compressed using
JPEG-HDR at dierent quality settings, using the global photographic operator, bilateral lter, histogram operator, and gradient domain operator.
The HDR images compressed using JPEG-HDR were compared with original ones using VDP to study the quality of the image. The study showed
that the method can achieve a compression rate between 0.63.75 bpp for
quality settings between 5799%. However, quality degrades rapidly for
JPEG quality below 60%, but only 2.5% of pixels were visibly dierent
with a quality set at 90%, and only 0.1% with maximum quality.
JPEG-HDR provides a good quality, 0.12.5% perceptual error, consuming a small amount of memory, 0.63.75 bpp. Moreover, the method is
backward compatible because RId is encoded using only extra application
markers of the JPEG format. When an application not designed for HDR
imaging opens a JPEG-HDR le, it displays only the tone mapped layer,
allowing the user to have access to part of the content.

198

7. HDR Content Compression

if (~ exist ( quality ) )
quality = 95;
end
quality = ClampImg ( quality ,1 ,100) ;
% Tone mapping using Reinhard s operator
gamma = 2.2;
invGamma = 1.0/ gamma ;
[ imgTMO , pAlpha , pWhite ] = ReinhardT M O ( img ) ;
% Ratio
RI = lum ( img ) ./ lum ( imgTMO ) ;
[r ,c , col ]= size ( img ) ;
% JPEG Q u a n t i z a t i o n
flag = 1;
scale = 1;
nameRatio =[ nameOut , _ratio . jpg ];
while ( flag )
RItmp = imresize ( RI , scale , bilinear ) ;
RIenc = log2 ( RItmp +2^ -16) ;
RIenc = ( ClampImg ( RIenc , -16 ,16) +16) /32;
% Ratio images are stored with maximum quality
imwrite ( RIenc .^ invGamma , nameRatio , Quality ,100) ;
scale = scale - 0.005;
% stop ?
valueDir = dir ( nameRatio ) ;
flag = ( valueDir . bytes /1024) >64;
end
imgRI =( double ( imread ( nameRatio ) ) /255) .^ gamma ;
imgRI = ClampImg ( imgRI *32 -16 , -16 ,16) ;
imgRI =2.^ imgRI ;
imgRI = imresize ( imgRI ,[ r , c ] , bilinear ) ;
% Tone mapped image
for i =1:3
imgTMO (: ,: , i ) = img (: ,: , i ) ./ imgRI ;
end
imgTMO = R e m o v e S p e c i a l s ( imgTMO ) ;
% Clamping using the 0.999 th percentile
maxTMO = MaxQuart ( imgTMO ,0.999) ;
imgTMO = ClampImg ( imgTMO / maxTMO ,0 ,1) ;
imwrite ( imgTMO .^ invGamma ,[ nameOut , _tmo . jpg ] , Quality , quality
);
% output tone mapping data
fid = fopen ([ nameOut , _data . txt ] , w ) ;
fprintf ( fid , maxTMO : % g \n , maxTMO ) ;
fclose ( fid ) ;
end

Listing 7.1. Matlab Code: JPEG-HDR encoder implementing the compression


method by Ward and Simmons [219, 220].

7.2. HDR Image Compression

199

The code for the encoder of JPEG-HDR is shown in Listing 7.1. The full
code can be found in the le JPEGHDREnc.m under the folder Compression.
The function takes as input the HDR image to compress, img, the output
name for the compressed image, nameOut, and the JPEG quality setting,
quality, a value in the range [1, 100], where 1 and 100 respectively mean
the lowest and the highest quality values.
Firstly, the function checks if quality was set by the user; otherwise it
sets it to a default value of 95. Afterwards, the image is tone mapped using
the photographic tone reproduction operator [180], calling the function
ReinhardTMO.m in order to reduce the high dynamic range. This output
is stored in imgTMO. At this point the ratio image, RI, is computed as the
ratio of the luminance of imgTMO and img. Then the function enters in
a while loop to minimize the size of RI until it is below 64 KB (during
this process RI is stored as JPEG le). To achieve this the image could be
downsampled using the function imresize.m. Finally, the original image
is tone mapped with the optimized RI and stored as a JPEG le using
imwrite.m. Additional information about the normalization process of the
tone mapped image is saved in a text le.
gamma = 2.2;
% Read the tone mapped values
fid = fopen ([ name , _data . txt ] , r ) ;
fscanf ( fid , %s ,1) ;
maxTMO = fscanf ( fid , %g ,1) ;
fclose ( fid ) ;
% Read the tone mapped layer
imgTMO = maxTMO *(( double ( imread ([ name , _tmo . jpg ]) ) /255) .^ gamma ) ;
[r ,c , col ] = size ( imgTMO ) ;
% Read the RI layer
imgRI =( double ( imread ([ name , _ratio . jpg ]) ) /255) .^ gamma ;
imgRI = ClampImg ( imgRI *32 -16 , -16 ,16) ;
imgRI =2.^ imgRI ;
imgRI = imresize ( imgRI ,[ r , c ] , bilinear ) ;
% Decoded image
imgRec = zeros ( size ( imgTMO ) ) ;
for i =1:3
imgRec (: ,: , i ) = imgTMO (: ,: , i ) .* imgRI ;
end
imgRec = R e m o v e S p e c i a l s ( imgRec ) ;
end

Listing 7.2. Matlab Code: JPEG-HDR decoder implementing the compression


method by Ward and Simmons [219, 220].

200

7. HDR Content Compression

The code for decoding is shown in Listing 7.2. The full code of the decoder can be found in the le JPEGHDRDec.m under the folder Compression.
The function takes as input the name of the compressed image (without
any le extension, i.e., similar to the input of the encoder). Note that the
decoding process is quite straightforward and it just reverses the order of
operations of the encoder (there is no minimization process).

7.2.2 HDR-JPEG2000
Xu et al. [236] proposed a straightforward preprocessing technique that
enables the JPEG2000 standard [37] to encode HDR images. The main
concept is to transform oating-point data into unsigned short integers
(16-bit) that are supported by the JPEG2000 standard.
The encoding phase starts with the reduction of the dynamic range by
applying the natural logarithm to the RGB values:


Rw (x)
log Rw (x)
Gw (x) = log Gw (x) .

(x)
Bw
log Bw (x)
Then, the oating-point values in the logarithmic domain are discretized
to unsigned short integers:

Rw (x)
f Rw (x)

x xmin
Gw (x) = f Gw (x) ,
, (7.2)
f (x, n) = (2n 1)


xmax xmin
(x)
f
B
B w (x)
w
where xmax and xmin are respectively the maximum and minimum value
for the channel of x, and n = 16. Finally, the image is compressed using a
JPEG2000 encoder.
To decode, the image is rst decompressed using a JPEG2000 decoder,
then it is converted from integer into oating-point values by inverting
Equation (7.2), which is subsequently exponentiated (Equation (7.3)).

g Rw (x)
e
Rw (x)


Gw (x) = eg Gw (x) g(x, n) = f 1 (x, n) = x (xmax xmin )+ xmin

2n 1
Bw (x)
e B w (x)
(7.3)
The method was compared in JPEG2000 lossy mode against JPEGHDR [220] and HDRV [133] and in JPEG2000 lossless mode against RGBE
[221], LogLuv [111], and OpenEXR [89]. The employed metrics were RMSE
in the logarithm domain and Lubins VDM [127]. The results of these comparisons showed that HDR-JPEG2000 in lossy mode is superior to JPEGHDR and HDRV, especially at low bit rates when these methods have

7.2. HDR Image Compression

201

artifacts. Nevertheless, the method does not perform well when lossless
JPEG2000 is used, because the le size is higher than the le size when
using RGBE, LogLuv, and OpenEXR (these methods are lossy in the oat
precision, but not spatially).
The HDR-JPEG2000 algorithm is a straightforward method for lossy
compression of HDR images at high quality without artifacts at low bit
rates. However, the method is not suitable for real-time applications because xed time look-ups are needed. Also, the method does not exploit all
the compression capabilities of JPEG2000 as it operates at a high level. For
example, separate processing for luminance and chromaticity could reduce
the size of the nal image while keeping the same quality.
The code for the encoder of HDR-JPEG2000 method is shown in Listing 7.3. The code of the encoder can be found in the le HDRJPEG2000

if (~ exist ( compRatio ) )
compRatio = 2;
end
if ( compRatio <1)
compRatio = 1;
end
delta = 1e -6;
% Range reduction
nBit = 16;
imgLog = log ( img + delta ) ;
xMin = zeros (3 ,1) ;
xMax = zeros (3 ,1) ;
for i = 1:3
xMin ( i ) = min ( min ( imgLog (: ,: , i ) ) ) ;
xMax ( i ) = max ( max ( imgLog (: ,: , i ) ) ) ;
imgLog (: ,: , i ) = ( imgLog (: ,: , i ) - xMin ( i ) ) /( xMax ( i ) - xMin ( i ) ) ;
end
imgLog = uint16 ( imgLog *(2^ nBit -1) ) ;
imwrite ( imgLog ,[ nameOut , _comp . jp2 ] , CompressionR a ti o ,
compRatio , mode , lossy ) ;
% output tone mapping data
fid = fopen ([ nameOut , _data . txt ] , w ) ;
for i = 1:3
fprintf ( fid , xMax :  % g  xMin :  % g \ n , xMax ( i ) , xMin ( i ) ) ;
end
fclose ( fid ) ;
end

Listing 7.3. Matlab Code: The HDR-JPEG2000 encoder implementing the


compression method by Xu et al. [236].

202

7. HDR Content Compression

Enc.m under the folder Compression. The function takes as input the
HDR image to compress, img, the output name for the compressed image,
nameOut, and the compression ratio, compRatio, which has to be set greater
than one.
Firstly, the function checks if the compRatio was set by the user; otherwise it sets to 1, which means that img will be compressed at maximum
quality. At this point, the image is stored in the logarithmic domain,
imgLog, and each color channel is separately normalized in [0, 1]. Then,
imgLog is saved as a JPEG2000 le using the imwrite.m function (note
that images can be saved in the JPEG2000 format only from Matlab version 2010a). Finally, values used to normalize each color channel, xMin and
xMax, are stored in a text le.
The code for decoding is shown in Listing 7.4. The code of the decoder
can be found in the le HDRJPEG2000Dec.m under the folder Compression.
if (~ exist ( compRatio ) )
compRatio = 2;
end
if ( compRatio <1)
compRatio = 1;
end
delta = 1 e -6;
% Range reduction
nBit = 16;
imgLog = log ( img + delta ) ;
xMin = zeros (3 ,1) ;
xMax = zeros (3 ,1) ;
for i = 1:3
xMin ( i ) = min ( min ( imgLog (: ,: , i ) ) ) ;
xMax ( i ) = max ( max ( imgLog (: ,: , i ) ) ) ;
imgLog (: ,: , i ) = ( imgLog (: ,: , i ) - xMin ( i ) ) /( xMax ( i ) - xMin ( i ) ) ;
end
imgLog = uint16 ( imgLog *(2^ nBit -1) ) ;
imwrite ( imgLog ,[ nameOut , _comp . jp2 ] , Compression Ra ti o ,
compRatio , mode , lossy ) ;
% output tone mapping data
fid = fopen ([ nameOut , _data . txt ] , w ) ;
for i = 1:3
fprintf ( fid , xMax :  % g  xMin :  % g \ n , xMax ( i ) , xMin ( i ) ) ;
end
fclose ( fid ) ;
end

Listing 7.4. Matlab Code: The HDR-JPEG2000 decoder implementing the


compression method by Xu et al. [236].

7.2. HDR Image Compression

203

The function takes as input the name of the compressed image (without any
le extension, i.e., similar input to the encoder). Note that the decoding
process is quite straightforward and it just reverses the order of operations
of the encoder.

7.2.3 Two-Layer Coding Algorithm for High Dynamic Range


Images
Okuda and Adami [159] proposed a similar scheme to JPEG-HDR with
backward compatibility. The main dierences are the presence of a minimization step for optimizing tone mapping parameters, the compression of
residuals using wavelets, and the use of the Hill function for tone mapping
and its analytic inverse instead of a low resolution image as in JPEG-HDR.
The Hill function is a generalized sigmoid function that is dened as
Ld (x) = f (Lw (x)) =

Lw (x)n
,
Lw (x)n + k n

where n and k are a constants that depend on the image. The inverse g of
f is given by
Lw (x) = g(Ld (x)) = f

Ld (x)
(Ld (x)) = k
1 Ld (x)

n1
.

The encoding is divided into a few steps (see Figure 7.4 and Figure 7.5).
Firstly, a minimization process using the original HDR image is performed

Figure 7.4. The encoding pipeline presented of Okuda and Adams method [159].
(The original HDR image is courtesy of Ahmet O
guz Aky
uz.)

204

7. HDR Content Compression

Figure 7.5. The decoding pipeline presented of Okuda and Adamis method [159].
(The original HDR image is courtesy of Ahmet O
guz Aky
uz.)

in the logarithmic domain to match HVS perception and avoid outliers at


high values. The error to minimize is given by


2
E=
log(Lw (x)) log(g(Ld (x)))
(7.4)
xI

for determining n and k. The optimum solution is uniquely determined


imposing the partial derivatives of E for k and n equal to zero, leading to



2
B(x)
A(x)B(x)
x B(x)
x A(x)

x

and
k = exp
2
2
M
x B(x)
x B(x)

2
2
M

x B(x)
x B(x)

,
n=



M
x A(x)B(x)
x A(x)
x B(x)
where M is the number of pixels, and A and B are dened as


Ld (x)
.
A(x) = log Lw (x),
B(x) = log
1 Ld (x)
Once the parameters are determined, the image is tone mapped and
encoded using JPEG. To improve quality, residuals are calculated as


Lw (x)
R(x) =
,
g(Ld (x)) + 
where (0, 1] is a constant, and  0 is a small value to avoid discontinuities chosen by the user. Finally, R is encoded using a wavelet image
compression scheme.

7.3. HDR Texture Compression

205

Once the LDR image and residuals are decoded using a JPEG decoder
and a wavelet decoder, the nal HDR values are recovered by

1
Lw (x) = R(x) g(Ld (x)) +  .
Two color compensation methods are presented to preserve distortions
caused by tone mapping. The rst one is a modication of Ward and Simmons [220] where and are calculated with a quadratic minimization
using an error function similar to Equation (7.4). The second method is
to apply a polynomial P (x) for each LDR color channel, assuming that
a polynomial relationship exists between LDR and HDR values. Coecients of P (x) are tted using the Gaussian weighted dierence between
the original HDR channel and the reconstructed HDR channel.
The compression scheme was evaluated on a data set of 12 HDR images
and compared with JPEG-HDR and HDR-MPEG using two metrics: the
mean square dierence (MSE) in CIELAB color space [64] and MSE in
the Dalys nonlinearity domain [48]. In these experiments, the proposed
method achieved better results for both metrics in comparison with JPEGHDR and HDR-MPEG at dierent bit rates. While the quality of this
method is up to two times better than HDR-MPEG and JPEG-HDR at
high bit rates (around 810 bits), it is comparable to them at low bit rates
(around 14 bits).

7.3

HDR Texture Compression

The focus of this section is on the compression of HDR textures, which


are images used in computer graphics for increasing details of materials
in a three-dimensional scene. The main dierence between compressed
textures and images is that the fetch of a pixel value has to happen in
constant time allowing random access to the information. The drawback
in having a random access is the limit in compression rates, around 8 bpp,
because spatial redundancy is not fully exploited. Note that in the schemes
of the previous section, the full image needs to be decoded to have access
to pixels.
Most of the texture schemes are based on block truncation coding
(BTC) [84]. BTC is a compression scheme that divides the image in 4 4
pixels blocks. For each block the average value, m, is calculated and each
pixel, x, is encoded as 0 if x < m, and as 1 if not. Then, the means of each
group of pixels is calculated and stored. During the encoding, the mean of
each group is assigned to their pixels (see Figure 7.6).
A typical BTC scheme for texture is S3TC by Iourcha et al. [90] and
inspired by Knittel et al.s work [100], which is termed DXTC in Direct3D

206

7. HDR Content Compression

Figure 7.6. An example of BTC.

(DXT1, DXT2, DXT3, DXT4, and DXT5 are variants of DXTC). This
scheme tries to t pixel values of a block into a line in the color space using
a minimization process. Encoded values are the two base colors of the line
discretized using 16 bits, and for each pixel the point on the line discretized
using 2 bits. During the decoding, pixels are linearly interpolated from the
base colors that encode the line.
The texture compression schemes in this section are a trade-o between
compression rates, hardware support, speed of encoding, speed of decoding,
and quality. The choice of the correct scheme depends on the constraints
of the application.

7.3.1 HDR Texture Compression Using Geometry Shapes


One of the rst BTC schemes for HDR textures was proposed by Munkberg
et al. [151]. This scheme compresses 48 bpp HDR textures into 8 bpp,
leveraging on logarithmic encoding of luminance and geometry shape tting
for the chrominance channel.
The rst step in the decoding scheme is to transform data into a color
space where luminance and chrominance are separated to achieve a better
compression. For this task, the authors dened a color space, Y uv, as



log2 Yw (x)
0.299
Rw (x)
Y w (x)


0.114 1 B (x)
,
Yw (x) = 0.587 Gw (x) ,
uw (x) =
Yw (x) w

Bw (x)
0.114
0.299 Yw1(x) Rw (x)
v w (x)
where u and v are in [0, 1], with u + v 1. The image is then divided
into 4 4 pixels blocks. For each block, the maximum, Y max , and the
minimum, Y min , luminance values are calculated (see Table 7.1). These
values are quantized at 8 bits and stored to be used as base luminance
values for the interpolation in a similar way to S3TC [90]. Moreover, the

7.3. HDR Texture Compression

207

byte
0
1
2

10
11
12
13
14
15

typeshape
ustart

Y0

v start
uend
ind0
ind4

4 3
Y max
Y min
Y1
...

...
ustart
v start

ind1
ind5

uend
v end
ind2
ind6

ind3
ind7

Table 7.1. The table shows bit allocation for a 4 4 block in Munkberg et al.s
method [151].

other luminance values are encoded with 2 bits, which minimize the value
of the interpolation between Y min and Y max .
At this point, chrominance values are compressed. The rst step is
to halve the resolution of the chrominance channel. For each block, a
two-dimensional shape is chosen as the one that ts chrominance values
in the (u, v) plane minimizing the error (see Figure 7.7). Finally, a 2-bit
index is stored for each pixel that points to a sample along the tted twodimensional shape.
In the decoding scheme, luminance is rstly decompressed, interpolating
values for each pixel in the block as
Y w (y) =

1
Y k(y) Y min + (1 Y k(y) )Y max ,
3

where Y k(y) is the Y that corresponds to a pixel at location y. The chrominance is then decoded as
$
$ #
$
#
#
$
#
uw (y)
u
uend
v
v start
u
+(indk(y) ) end
+ start ,
= (indk(y) ) start
v w (y)
v start v end
ustart uend
v start
where and are parameters specic for each two-dimensional shape.
Subsequently, chrominance is up-sampled to the original size. Finally, the
inverse Y uv color space transform is applied, obtaining the reconstructed
pixel


Rw (x)
v w (x)
0.2291
Gw (x) = 2Y w (x) 0.5871 1 v w (x) uw (x) .
Bw (x)
0.1141
uw (x)
The compression scheme was compared against two HDR S3TC variants
using mPSNR [151], log2 [RGB] RMSE [236], and HDR-VDP [134, 135].

208

7. HDR Content Compression

(a)

(b)

Figure 7.7. The encoding of chrominance in Munkberg et al. [151]. (a) Twodimensional shapes used in the encoder. Black circles are for start and end,
white circles are for interpolated values. (b) An example of a two-dimensional
shape tting for a chrominance block.

A data set of 16 HDR textures was tested. The results showed that the
method presents higher quality than S3TC variants, especially perceptually.
The method proposed by Munkberg et al. [151] is a compression scheme
that can achieve 8 bpp HDR texture compression at high quality. However,
the decompression method needs special hardware, so it cannot be implemented in current graphics hardware. Furthermore, the shape tting can
take up to an hour for a one Megapixel image, which limits the scheme to
xed content.

7.3.2 HDR Texture Compression Using Bit and Integer


Operations
Concurrently with Munkberg et al. [151], Roimela et al. [186] presented an
8 bpp BTC scheme for compressing 48 bpp HDR textures, which was later
improved in Roimela et al. [187].
The rst step of the coding scheme is to convert data of the texture
into a color space suitable for compression purposes. A computationally
ecient color space is dened, which splits RGB colors into luminance and
chromaticity. The luminance Iw (x) is dened as
Iw (x) =

1
1
1
Rw (x) + Gw (x) + Bw (x)
4
2
4

and chromaticity [rQ , bQ ] as


#
# $
$
1
Rw (x)
rQ
=
.
bQ
4Iw (x) Bw (x)

7.3. HDR Texture Compression

byte
0
1
2

209

nzero
lum1

5 4 3
Ibias
lum0

0
nzero
lum1

lum2
...

10
11
12
13
14
15

...

lum15

lum15

r1

r0
b2

bbias
b1

rbias
b0
r3

r2

bbias
czero
r1
b2
b3

Table 7.2. The table shows bit allocation for a 4 4 block in Roimela et al.s
method [186].

Then, the image is divided into 4 4 pixels blocks. For each block, the
luminance value with the smallest bit pattern is calculated, Imin , and its
ten least signicant bits are zeroed, giving Ibias (only 6 bits are stored).
Subsequently, Ibias is subtracted bit by bit from all luminance values in the
block:
bit(Iw (y)) = bit(Iw (y)) bit(Ibias ),
where bit operator denotes the integer bit representation of a oating-point
number and y is a pixel in the block under processing. Values Iw (y) share
a number of leading zero bits that do not need to be stored. Therefore,
they are counted in the largest Iw (y). The counter, nzero , is clamped to
seven and stored in 3 bits. At this point, the nzero + 1 least important
bits are removed from each Iw (y) in the block, obtaining lumw (y), which
is rounded and stored as 5 bits. Chromaticity is now compressed. Firstly,
the resolution of the chromaticity channels is halved. Secondly, the same
compression scheme for luminance is applied to chromaticity, having two
bias values at 6 bits, one for rQ , rQ,bias , and the other for bQ , bQ,bias .
Furthermore, there is a common zero counter, czero , and nal values are
rounded in 4 bits. The number of bits for luminance and chromaticity
channels are respectively 88 bits and 40 bits for a total of 128 bits or 8
bpp. Table 7.2 shows the complete allocation of bits.
To decode, rstly, luminance is decoded by bit shifting to the left each
lumy (w) value nzero + 1 times, and adding Ibias . Secondly, this operation
is repeated for the chromaticity channel, which is subsequently up-sampled
to the original size. Finally, the image is converted from IrQ bQ color space

210

7. HDR Content Compression

to RGB applying the inverse transform

4 0 0
rQ,w (x)
Rw (x)
Gw (x) = Iw (x) 0 2 0 1 rQ,w (x) bQ,w (x) .
0 0 4
Bw (x)
bQ,w (x)
This scheme was compared against Munkberg et al.s scheme [151],
HDR-JPEG 2000 [236], and a HDR S3TC variant using dierent metrics:
PSNR, mPSNR [151], HDR-VDP [134, 135], and RMSE. A data set of 18
HDR textures was tested. The results showed that the encoding method
has quality similar to RGBE. Moreover, it is similar to Munkberg et al.s
scheme [151], but the chromaticity quality is lower.
This compression scheme presents a computationally ecient encoding/decoding scheme for 48 bpp HDR textures. Only integer and bit operations are needed. Furthermore, it achieves high quality images at only
8 bpp. However, the main drawback of the scheme is that it cannot be
implemented on current graphics hardware.

7.3.3 HDR Texture Compression Encoding LDR and HDR Parts


Methods, such as the one of Munkberg et al. [151] and Roimela et al. [186,
187] have the main problem that they cannot be implemented on current
graphics hardware. To solve this problem, Wang et al. [217] proposed a
compression method based on S3TC [90]. The main idea is to split the
HDR image in two parts: one part with LDR values and the second part
with HDR values (see Figure 7.8). The two parts are stored on two S3TC
textures for a total of 16 bpp.
The encoding starts by splitting luminance and chrominance using the
LU V W color space, which is dened as

Uw (x)
Rw (x)
"
1
Gw (x) .
Lw (x) = Rw (x)2 + Gw (x)2 + Bw (x)2 , Vw (x) =
Lw (x)
Ww (x)
Bw (x)
After the color conversion, the luminance channel is split into an HDR
and an LDR part. This is achieved by nding the threshold, Lw, s , that
minimizes quantization error of encoding uniformly within the LDR and
HDR part separately. This is dened as
E(Lw, s ) = nLDR

(Lw, s )(Lw, s Lw, min )


(Lw, s )(Lw, max Lw, s )
+ nHDR
,
2bLDR
2bHDR

where nLDR and nHDR are respectively the number of pixels in the LDR
and HDR part. The variables bLDR and bHDR are respectively the number
of bits for quantizing the HDR and LDR part. The HDR texture is stored

7.3. HDR Texture Compression

211

Number of Pixels

1.5

(b)

0.5

0
0

10

20

30

40

50
Bucket

60

70

80

90

100

(a)

(c)

Figure 7.8. An example of the separation process of the LDR and HDR part in
Wang et al. [217] applied to the Bristol Bridge HDR image. (a) The histogram
of the image; the axis that divides the image in LDR and HDR is shown in red.
(b) The LDR part of the image, uniformly quantized. (c) The HDR part of the
image, uniformly quantized. (The original HDR image is courtesy of Gregory J.
Ward [225].)

in two S3TC textures, Tex0 and Tex1 , as


Tex0R (x) = Uw (x),

Tex0A (x) =

1
Lw, s Lw,

min

Lw (x)

0


Tex1A (x) =

Tex0G (x) = Vw (x),

if Lw, min Lw (x) Lw, s ,


otherwise;

Lw,

Tex0B (x) = Ww (x),

max Lw, s

Lw (x)

if Lw, s < Lw (x) Lw, max ,


otherwise;

where the subscripts R , G , B , and A respectively indicate the red, green,


blue, and alpha channels of a texture. Furthermore, additive residuals are
included to improve quality. These are simply calculated as
res(x) = Lw (x) Lw (x) ,
where Lw (x) is the reconstructed luminance, which is dened as
Lw (x) = Tex0A (x)(Lw, max Lw, s (x))
+ Tex1A (x)(Lw, s (x) Lw, min ) + Lw, min .
Then, res is split into three parts in a similar way to the luminance and
stored in the red, green, and blue channels of Tex1 .

212

7. HDR Content Compression

(a)

(b)

(c)

Figure 7.9. An example of failure of the compression method of Wang et al. [217]
applied to Saint Peters Basilica HDR image. (a) The image at exposure 0. (b) A
zoom of the red square in (a) from the original image. (c) A zoom of the red
square in (a) from the compressed image. Note that quantization artifacts are
visible in the form of contouring. (The original HDR image is courtesy of Paul
Debevec [53].)

Firstly, S3TC textures are decoded, then the luminance channel is reconstructed as
Lw = Lw (x) + Tex1R (x)(ress1 resmin ) + Tex1G (x)(ress2 ress1 )
+ Tex1B (x)(resmax ress2 ) + resmin .

(7.5)

Finally, the image is converted from LU V W color space to RGB color


space:

Rw (x)
Uw (x)
Gw (x) = Lw (x) Vw (x) .
Bw (x)
Ww (x)

7.3. HDR Texture Compression

213

The compression scheme was compared against RGBE and OpenEXR


formats using classic PSNR, and the quality was worse than RGBE (57
dB less) and OpenEXR (4060 dB less). Nevertheless, the method needs
only 16 bpp compared with the 32 bpp used by RGBE and 48 used by
OpenEXR.
This compression scheme presents an acceptable quality and, since only
simple operations are used in Equation (7.5), it can be mapped on current
graphics hardware at high frame rates (470480 fps with a 5122 viewport
and texturing for each pixel). The main drawback of the method is that
an optimal Lw, s for an image can generate quantization artifacts that are
noticeable (see Figure 7.9).

7.3.4 HDR Texture Compression with Tone Mapping and Its


Analytic Inverse
Banterle et al. [21] presented a compression scheme for textures that share
some similarities with Okuda and Adamis approach [159]. This method
was designed to directly take advantage of graphics hardware. The generalized framework presented used a minimization process that takes into
account the compression scheme for tone mapped images and residuals.
Moreover, it was shown that up-sampling of tone mapped values before
expansion does not introduce visible errors.
The authors employed the global Reinhard et al. operator [180] and its
inverse [19] in their implementation. The forward operator is dened as

Lw (x)(Lw (x)+Lw, H L2white )

f (Lw (x)) = Ld (x) = Lw, H L2 (Lw (x)+Lw, H ) ,


white

R (x), G (x), B (x) =


d
d
d


Ld (x)
,
Lw (x) Rw (x), Gw (x), Bw (x)

where Lwhite is the luminance white point, Lw, H is the logarithmic average,
and is the scale factor. While the inverse is given by

g(Ld (x)) = f 1 (Ld (x))

 = Lw (x)

L
Lw, H

2 + 4Ld (x) ,
L
= white
(x)

1
+
(1

L
(x))
2
d
d
2
Lwhite





(x)
Rw (x), Gw (x), Bw (x) = LLwd (x)
Rd (x), Gd (x), Bd (x) .
The rst stage of encoding is to estimate parameters of the TMO, similarly to [181], and to apply a color transformation. Figure 7.10 shows the
encoding pipeline. However, this last step can be skipped because S3TC
does not support color spaces with separated luminance and chromaticity.
Subsequently, the HDR texture and estimated values are used as input in

214

7. HDR Content Compression

Figure 7.10. The encoding pipeline presented in Banterle et al. [21]. (The Eucalyptus Grove HDR environment map is courtesy of Paul Debevec.)

a Levenberg-Marquadt minimization loop, which ends when the local optimum for TMO parameters is reached. In the loop, the HDR texture is
rstly tone mapped and encoded with S3TC. Secondly, residuals are calculated and encoded using S3TC. Finally, the image is reconstructed, the
error is calculated, and new TMO parameters are estimated. When the
local optimum is reached, the HDR texture is tone mapped with these
parameters and encoded using S3TC with residuals in the alpha channel.
The decoding stage can be implemented in a simple shader on a GPU.
The decoding pipeline is shown in Figure 7.11. When a texel is needed in a
shader, the tone mapped texture is fetched and its luminance is calculated.
The inverse tone mapping uses these luminance values, combined with the
TMO parameters, to obtain the expanded values that are then added to
the residuals. Finally, luminance and colors are recombined. Note that

Figure 7.11. The decoding pipeline presented in Banterle et al. [21]. (The Eucalyptus Grove HDR environment map is courtesy of Paul Debevec.)

7.3. HDR Texture Compression

215

Figure 7.12. A comparison of real-time decoding schemes on current graphics


hardware applied to St. Peters Cathedral environment map. Banterle et al.s
scheme [21] (left) and Wang et al.s scheme [218] (right) showing visible contouring artifacts. (The Saint Peters Basilica HDR environment map is courtesy of
Paul Debevec. The Happy Buddha model is courtesy of the Stanford 3D Models
Repository.)

the inverse operator can be precomputed into a one-dimensional texture to


speed up the decoding. Moreover, computations can be sped up by applying
ltering during the fetch of the tone mapped texture. This is because the
ltering is applied to coecients of a polynomial function. Banterle et al.
proposed a bound of this error, showing that it is not signicant in many
cases.
This proposed scheme was compared to RGBE [221], Munkberg et al.s
method [151], Roimela et al.s scheme [186], and Wang et al.s scheme [218]
using HDR-VDP [135], mPSNR [151], and RMSE in the logarithm domain
[236]. The results showed that the approach presented a good trade-o
between quality and compression, as well as the ability to decode textures
in real time. Moreover, it has a better quality on average than Wang et
al.s method, the other real-time decoding scheme, and avoids contouring
artifacts, as shown in Figure 7.12. The main disadvantage of this method
is that it is not able to eciently encode the luminance and chromaticity
due to limits of S3TC.

7.3.5 An Effective DXTC-Based HDR Texture Compression


Scheme
Sun et al. [197] presented a S3TC/DXTC-based HDR texture compression
(DHTC), which separately compresses luminance and color values. The

216

7. HDR Content Compression

byte
0
1
2

M13

5
L0

L1
M0

Tindx
M1
...
M14

1 0
L1
Ch
M2
M15

Table 7.3. The table shows bit allocation for luminance values L0 and L1 and
the modier table M0 M15 in a 4 4 block in Sun et al. [197].

scheme works on 4 4 pixel blocks as previous block truncation compression


methods.
The encoding starts with the separation of luminance and color information using a classic YUV color space that is inspired by Munkberg et
al. [151] and dened as
Y =

wi Ci ,

Si =

Ci wi
,
Y

(7.6)

where Y values are clamped in [215 , 216 ] and Si in [211 , 1]. This color
transformation allows only three parameters to be saved, Y, U = Sr , and
V = Sg , because Sb can be reconstructed from previous ones. However,
the blue channel can suer from quantization error if it has a small value.
This problem is solved by encoding the smallest values, leaving the largest
for reconstruction. To save memory, the largest channel, Ch, is calculated
per block:
 j
Ch = argmaxj(r,g,b)
Si ,
iblock

and it is stored using 4 bits in the block (see Table 7.3).


After color transformation, Y values are quantized into integer values in
the logarithmic space using maximum and minimum values of each block:


log2 Y L0
,
Yint =
L1 L0
L0 = log2 max(Yi block),

L1 = log2 min(Yi block).

The variables L0 and L1 are discretized using 5 bits and stored as block
information (see Table 7.3). Finally, Yint and U V are quantized at 8 bits.
Note that U V can be adaptively stored in the linear or logarithmic domain
per block to improve eciency.
A further transformation is applied to improve eciency during DXTC
compression, the point translation transformation (PTT). This is obtained

7.3. HDR Texture Compression

byte
0
1
2
3
4

217

5
U0

Y0

V0
U1

Y1

Y1
C0

1
Y0

V1
C1

C2

C3

C14

C15

...
7

C12

C13

Table 7.4. The table shows local color bit allocation for a 4 4 block in Sun et
al. [197].

by forcing color values of a block to be in a line segment between two colors,


(U0 , Y0 , V0 ) and (U1 , Y1 , V1 ). The PTT was designed oine as
m(Mindx , Tindx ) = (1)Mindx &1 2Tindx >>2 (1 + (Tindx &3 + 1)(Mindx >> 1)),
(7.7)

255
Yt = Yint + m(Mindx , Tindx ) 0 ,
where Tindx [0, 15] is a parameter per block, and Mindx [0, 15] are
parameters per pixel (see Table 7.3). These values are calculated during
compression using a minimization process. Finally, values are ready to be
compressed using the standard DXT1 compression scheme (see Table 7.4).
The decoding stage is divided into a few steps. When a pixel of a block
needs to be decoded, rstly the Y U V color is reconstructed using an DXT1
decoder. Then m(Mindx , Tindx ), in Equation (7.7), is removed from the Y
coordinate, which is converted from local luminance to global luminance
by inverting Equation (7.8). Finally, Y values are exponentiated (even U V
if they were encoded in the logarithm space), and the RGB color values are
obtained by inverting Equation (7.6).
The method was compared with Munkberg et al. [151], Roimela et
al. [186], and Wang et al. [217] using mPSNR [151], log2 [RGB] RMSE [236],
and HDR-VDP (P (X) = 0.75) [134, 135] as quality metrics. On a dataset
of six images, the method was shown to be slightly better than Munkberg
et al. [151], which performed the best of the compared methods.
This method can be used for encoding LDR textures with an alpha
channel. The comparison of this using DXTC5 [147] showed a slightly
better quality for the PSNR metric on a dataset of 12 images. The main
disadvantage is that the hardware support for decoding these textures is
present only in the latest class of graphics cards (at the time of writing)
that support Direct3D 11 [147].

218

7.4

7. HDR Content Compression

HDR Video Compression

HDR video compression presents some similarities with HDR image compression. Range compression using tone mapping, the reuse of LDR standard for encoding, and the use of residuals are kept. However, more sophisticated techniques are employed for exploiting temporal coherence in
between frames.

7.4.1 Perception-Motivated High Dynamic Range Video Encoding


Mantiuk et al. [133] proposed one of the rst solutions for storing HDR
videos in an ecient way called HDRV (see Figure 7.13). This is based
on the MPEG-4 part 2 standard [150] using Xvid [237] as the starting
implementation. Some stages were modied to be HDR compatible.
The rst step of the encoding is the conversion of the stream from
RGB/XYZ into LogLuv by Greg Ward [111] with luminance in the linear
domain. Then, the luminance dynamic range is reduced taking into account
limits of the HVS. This ensures that the quantization error is always below
visibility thresholds of the human eye. The mapping function is dened as
a solution of an ordinary dierential equation for , the backward mapping
function, which is unknown:
2
d(Lw )
= tvi((Lw )),
dLw
a
(0) = 104 cd/m ,
2

(7.8)
2

(Lw, max ) = 108 cd/m ,

where lmax = 2n 1, tvi is the threshold versus intensity function introduced


by Ferwerda et al. [68], and a > 0 is a parameter that denes how much
lower/conservative the maximum quantization errors are compared to tvi.
Note that Equation (7.8) assumes local adaptation at the pixel.

Figure 7.13. The encoding pipeline for HDRV by Mantiuk et al. [133]. (The
Napa Valley HDR environment map is courtesy of SpheronVR.)

7.4. HDR Video Compression

219

After this step, motion estimation and interframe prediction is performed as in standard MPEG-4 part 2 (see [150] for more details). Subsequently, nonvisible frequencies are removed in the frequency domain using
the discrete cosine transform (DCT). As before, this step is not modied,
keeping even the same quantization matrices of the standard. However,
a correction step is added after frequencies removal to avoid ringing artifacts around sharp transition (for example, an edge between a light source
and a diuse surface). This step separately encodes strong edges into an
edge map using run length encoding and the other frequencies into DCT
coecients using variable length encoding.
The decoding of a key frame is straightforward. Firstly, the edge map
and DCT coecients are decoded from the encoded stream. Secondly, the
two signals are recombined. Thirdly, is applied to the luminance channel,
obtaining the nal world luminance. Finally, the pixel values are converted
back from Luv color space into XYZ/RGB color space. When P-frames or
B-frames are decoded, an additional step of reconstruction is added using
motion vectors. See MPEG-4 part 2 standard [150] for more details.
The method was tested using dierent scenes, including rendered synthetic videos, moving HDR panoramas from Spheron Camera [192], and
a grayscale Silicon Vision Lars III HDR [210]. The results showed that
HDRV can achieve compression rates of around 0.09 bpp0.53 bpp, which
is approximately double the amount of MPEG-4 with tone mapped HDR
videos. HDRV does though outperform OpenEXR, though, which reaches
rates of around 1628 bpp.

7.4.2 Backward Compatible HDR-MPEG


Backward compatible HDR-MPEG is a codec for HDR videos that was
introduced by Mantiuk et al. [136]. As in the case of JPEG-HDR, this
algorithm is an extension to the standard MPEG-4 part 2 standard [150]
that works on top of the standard encoding/decoding stage allowing backward compatibility. In a similar way to JPEG-HDR, each frame is divided
in an LDR part using tone mapping and an HDR part, but, in this case,
for the reconstruction function (RF), a numerical inverse TMO is employed
instead of an RI.
The encoding stage takes as input an HDR video in the XYZ color
space and it applies tone mapping to each HDR frame, obtaining LDR
frames as the rst step. Figure 7.14 shows the complete pipeline. These
are coded using XVID, an MPEG4 part 2 implementation, stored in an
LDR stream, and nally decoded to obtain an uncompressed and MPEG
quantized frames. After this, the LDR frame and the HDR frame are
converted to a common color space. For both HDR and LDR frames,
CIE 1976 uniform chromaticity scales (u , v  ) coordinates are used to code

220

7. HDR Content Compression

Figure 7.14. The encoding pipeline for backward compatible HDR-MPEG by


Mantiuk et al. [136]. (The Napa Valley HDR environment map is courtesy of
SpheronVR GmbH.)

chroma. While nonlinear luma of sRGB is used for LDR pixels, a dierent
luma coding is used because sRGB nonlinearity is not suitable for high
luminance ranges [105 , 1010 ] (see [136]). This luma coding, at 12 bits, for
HDR luminance values is given as

209.16 log(Lw ) 731.28 if Lw 10469,


lw = f (Lw ) = 826.81L0.10013
884.17 if 5.6046 Lw < 10469, (7.9)
w

17.554Lw
if Lw < 5.6046.
At this point, the HDR and LDR frames are in a comparable color space.
Then RF, which maps LDR values, ld , to HDR ones, lw , is calculated by
averaging lw values that fall into one of 256 bins representing the ld values:
RF (i) =

1 
lw (x),
|(i )|

.
where i = x|ld (x) = i ,

xi

where i [0, 255] is the index of a bin. RF for chromaticity is approximated



by (ud , vd ) = (uw , vw
). Once RFs are calculated for all frames, they are
stored in an auxiliary stream using Human encoding. After this stage,
a residual image is calculated for improving overall quality, especially in
small details for luma:
rl (x) = lw (x) RF (ld (x)).

7.4. HDR Video Compression

221

Figure 7.15. The decoding pipeline for backward compatible HDR-MPEG by


Mantiuk et al. [136]. (The Napa Valley HDR environment map is courtesy of
SpheronVR.)

The residual image is discretized at 8 bits, using a quantization factor


dierent for each bin based on its maximum magnitude value. This leads to
#

rl (x)
rl (x) =
q(m)

$127
,
127

where m = k i k ,

where q(m) is the quantization factor, which is calculated for a bin l as




maxxl (|rl (x)|)
.
q(m) = max qmin ,
127
Then rl needs to be compressed in a stream using MPEG. A nave
compression would generate a low compression rate, because a large amount
of high frequencies are present in rl . In order to improve the compression
rate, the image is ltered removing frequencies in regions where the HVS
cannot perceive any dierence. This is achieved using the original HDR
frame as a guidance in the ltering. The ltering is performed in the
wavelet domain, and it is applied only to the three nest scales modeling
contrast masking and lower sensibility to high frequencies.
To decode, the MPEG streams (tone mapped video and residuals) and
RF streams are decoded. The complete pipeline is shown in Figure 7.15.
Then, an HDR frame is reconstructed applying rstly its RF to the LDR
decoded frame and secondly adding residuals to the expanded LDR frame.
Finally, CIE Luv values are converted to XYZ ones by inverting Equation (7.9) for luminance.

222

7. HDR Content Compression

HDR-MPEG was evaluated using three dierent metrics: HDR VDP


[134, 135], universal image quality index (UQI) [216], and classic signal-tonoise ratio (SNR). As in the case of JPEG-HDR, there was a rst study that
explored the inuence of a TMO on quality/bit rate. This experiment was
performed using the time-dependent visual adaption operator [168], the fast
bilateral ltering operator [62], the photographic tone reproduction operator [180], the gradient domain operator [67], and the adaptive logarithmic
mapping operator [60]. These TMOs were modied to avoid temporal ickering and applied to a stream using default parameters. The conclusion of
the study showed that most of these TMOs have the same performances except the gradient domain one, which created larger streams. However, this
TMO generated images better suited for backward compatibility. Therefore, the choice of a TMO for the video compression depends on the tradeo between bit rate and the backward compatible quality. The second
study compared HDR-MPEG against HDRV [133] and JPEG-HDR [220]
using the photographic tone reproduction operator as the TMO [180]. The
results showed that HDR-MPEG has a better quality than JPEG-HDR,
but a similar one to HDRV.

7.4.3 Rate-Distortion Optimized Compression of High Dynamic


Range Videos
Lee and Kim [116] proposed a backward compatible HDR video scheme
based on tone mapping and residuals. The new key contribution is the
use of a temporal coherent TMO to avoid ickering and an optimization
process to automatically allocate bits to tone mapped frames and residuals.
The rst step of the encoding is to tone map each frame of the stream
with a temporal coherent version of the gradient domain operator [115].
Figure 7.16 shows the full pipeline. After tone mapping, the stream is
encoded using the H.264 standard [233], and luminance residuals are encoded as


Lw (x)
R(x) = log2
+ ,
 > 0,
Ld (x)
where Lw (x) is the HDR luminance at pixel coordinate x of the original
frame, and Ld (x) is the decoded luminance of the LDR stream at same
pixel coordinate and frame. The calculation of R can lead to noise due
to quantization, which is removed by ltering it using the cross bilateral
lter with Lw (x) as guidance. Finally, the residual stream is encoded using
H.264 as well.
When encoding using H.264, the bit rates of the two streams are not
the same, but it is optimized in order to increase compression rates. The
quantization parameters for the LDR sequence, QPd , and the ratio sequence, QPratio , are calculated such that distortions of the reconstructed

7.4. HDR Video Compression

223

Figure 7.16. The encoding pipeline for Rate-Distortion Optimized Compression


of High Dynamic Range Videos, Lee and Kim [116]. (The Napa Valley HDR
environment map is courtesy of SpheronVR.)

LDR frames, Dd , and ones of reconstructed HDR frames, Dw , are minimized. This problem can be dened as a Lagrangian multiplier minimization problem:
J = Dw + Dw + (Rd + Rratio ),
where and are two Lagrangian multipliers. The authors found from
the analysis of J, a formula for controlling the quality of ratio stream as
QPratio = 0.77QPd + 13.42.
When decoding, the two H.264 streams are decoded, and the original
frame is calculated as

Rd (x)
Rw (x)
Gw (x) = Gd (x) 2R(x) .
Bw (x)
Bd (x)
This compression method was evaluated with MPEG-HDR [136]. The
metrics used were PSNR for the tone mapped backward compatible frames,
and HDR-VDP [134] for HDR frames. The results showed that while the
proposed method has a better quality than MPEG-HDR at low bit rates
for HDR frames, on average 10% less HDR-VDP error, MPEG-HDR has
better quality at bit rates higher than 1 bpp, on average 25%. Regarding
tone mapped frames, the rate-distortion optimized method has on average
more than 10 DB better quality than MPEG-HDR at any bit rate. Finally,
the authors analyzed the bit rates of tone mapped and residual streams
and showed that, on average, 1030% more space is needed for supporting
HDR videos.

224

7. HDR Content Compression

Name
JPEG-HDR
HDR-JPEG2000
TLCAHDR
HDRTGS
HDRTBIO
HDRTSLH
HDRTTMITMH
DHTC
HDRV
MPEG-HDR
H.264-HDR

BPP
Quality Backward Compatibility
IMAGE COMPRESSION
0.63.75 MQ-HQ
Yes
0.484.8
HQ
Yes
18
HQ
Partial
TEXTURE COMPRESSION
8
HQ
No
8
HQ
No
16
MQ
No
48
MQ-HQ
Yes
8
HQ
No
VIDEO COMPRESSION
0.095
HQ
No
0.26
HQ
Yes
0.264
HQ
Yes

Table 7.5. Summary of the various HDR content compression techniques for
images, textures, and videos. Each column provides the bpp (a range in the case
of varying quality), quality based on result of the original papers (MQ means
medium quality, HQ means high quality; note a range of quality is related to the
bpp quality), and backward compatibility. H means hardware support in the case
of textures. See Table 7.6 for a clarication of the key.
Key
JPEG-HDR
HDR-JPEG2000
TLCAHDR
HDRTGS
HDRTBIO
HDRTSL
HDRTTMITM
DHTC
HDRV
MPEG-HDR
H.264-HDR

Name
backward compatible JPEG-HDR
[219, 220]
HDR-JPEG2000
[236]
Two-Layer Coding Algorithm for High Dynamic Range Images
[159]
HDR Textures Compression Using Geometry Shapes
[151]
HDR Texture Compression Using Bit and Integer Operations
[186]
HDR Texture Compression Encoding LDR and HDR Parts
[217]
HDR Textures Compression with Tone Mapping and Its Analytic Inverse
[21]
An Eective DXTC-Based HDR Texture Compression Scheme
[197]
Perception-Motivated High Dynamic Range Video Encoding
[133]
backward compatible HDR-MPEG
[136]
Rate-Distortion Optimized Compression of High Dynamic Range Videos
[116]

Table 7.6. Key to HDR content compression techniques for Table 7.5.

7.5. Summary

7.5

225

Summary

Captured real-world lighting results in very large data sizes. Uncompressed,


a single HDR pixel requires 12 bytes of memory to store the three single
precision oating-point numbers for the RGB values. A single High Denition frame of 1920 1080 thus needs around 24 MB. With an HDR video
recording at a modest 30 frames a second, this becomes a lot of data. Without compression, it is simply not going to be feasible to store this amount
of data, let alone transmit it. Although terrestrial broadband is regularly
increasing in bandwidth, even this is unlikely to be sucient to broadcast
any uncompressed HDR lms in the future.
Not only does the compression have to signicantly reduce the amount
of data that needs storing and transmitting, but it should do so with minimal loss of visual quality. This is a major challenge as the increased
luminance means that many artifacts, which may not be noticed in LDR
content, will be easily discernable in HDR.
Finally, the compression should also preferably happen at real-time
rates to avoid specialized hardware capable of copying with the huge amount
of raw data and the need for subsequent oine compression. For example, the HDR video system showcased by the University of Warwick and
SpheronVR [33] requires a dedicated ber optic cable coupled to a 24 TB
disk array. Only by striping the data to the disk array is the device
capable of coping with the huge amount of data that is coming from the
camera.
Table 7.5 summarizes what has been achieved to date for compressing
HDR images, textures, and videos. An important question that has yet
to be resolved is whether new HDR compression algorithms really do need
to be backward compatible with existing LDR versions. Ensuring this
backward compatibility has the danger of severely limiting the potential
novelty and performance of new HDR algorithms. As the work presented
in this chapter has shown, HDR compression is a challenging problem.
Much more research is now needed if HDR images and video are ever to
become the norm in applications, such as television and video games,
and especially for mobile devices.

This page intentionally left blank

A
The Bilateral Filter

The bilateral lter is a nonlinear lter proposed by Tomasi and Manduchi


[201] that keeps strong edges while smoothing internal values. The lter
for an image I is dened as
1 
I(y)f (x y)g(I(y) I(x)),
k(x)
y

f (x y)g(I(y) I(x)),
(A.1)
k(x) =

I  (x) = B(I, f, g) =

where I  is the ltered image, f is the smoothing function for the range
term, g is the smoothing function for the intensity term, and k(x) is the
normalization term. In the standard notation, parameters for f are indicated with s , and the ones for g are indicated with r . An example of the
application of the lter is shown in Figure A.1.
Moreover, the bilateral ltering can be used for transferring edges from
a source image to a target image by modifying Equation (A.1) as
1 
I(y)f (x y)g(J(y) J(x)),
k(x)
y

f (x y)g(J(y) J(x)),
k(x) =

I  (x) = B(I, J, f, g) =

where J is the image containing edges to be transferred to I. This version


is called joint/cross bilateral ltering [102, 172]. When this version is used
for up-sampling, it is called joint bilateral up-sampling [102]. This is a
general technique to speed up computation of various tasks, such as stereo
matching, tone mapping, global illumination, etc. The task is computed

227

228

A. The Bilateral Filter

1.5

1.5

0.5
0.5

0.5
0
40

0
40

0
40

40
20

40
20

20
0

40
20

20

(a)

20
0

(b)

0.5

0.5

(c)

1.5
1
0.5

0
40

0
40

0
40

40
20

20
0

40
20
0

(d)

40
20

20

20
0

(e)

(f)

Figure A.1. An example of bilateral lter. (a) An edge corrupted by additive


Gaussian noise with = 0.25. (b) A spatial Gaussian kernel with s = 4 evaluated at the central pixel. (c) Image in (a) ltered with the Gaussian kernel in (b);
note that noise is removed but edges are smoothed causing blur eect. (d) An
intensity Gaussian kernel with r = 0.5 evaluated at the central pixel. (e) A
spatial-intensity Gaussian kernel obtained by multiplying the kernel in (b) and
(d), called a bilateral kernel. (f) Image in (a) ltered using the bilateral kernel;
note that while edges are preserved, noise is removed.

on a small scale and then it is up-sampled using the starting full resolution
image or other features (for example, normal or depth values in the case of
global illumination). See Figure A.2.

(a)

(b)

(c)

Figure A.2. An example of joint bilateral up-sampling for rendering. (a) The
starting low resolution image representing indirect lighting. (b) A depth map
used as edge map. (c) The up-sampled image in (a) transferring edges of (b)
with direct lighting added.

A. The Bilateral Filter

229

The complexity of the bilateral lter is o(nk 2 ), where n is the number of


pixels in the image and k is the size of the biggest kernel amongst the spatial
and intensity kernels. This complexity is very high, which means that a
large lter can take several minutes to compute for a megapixel image.
Moreover, the lter cannot be separated as in the case of a Gaussian lter,
because it is nonlinear. Separation can work only for small kernelsfor
example, lters with a window less than ten [173]. For bigger kernels,
artifacts can be visible in the nal image around edges. When kernels
are not Gaussians, the bilateral lter can be accelerated by reorganizing
computations because they are similar for neighbors of the current pixel
[230]. A more general class of speed-up algorithms approximate the ltering
using the so-called splat-blur-slice paradigm [1,2,34]. In such methods, the
image is seen as a set of multidimensional points, where intensity or colors
are coordinates similar to the spatial coordinates. Depending on the size
of ltering kernels, points are projected onto a low resolution spatial data
structure (splat) representing the multidimensional space of the image. At
this point, ltering is computed on this data structure (blur), and then
computed values are redistributed to the original pixels (slice).
There are three main disadvantages of the bilateral lter. The rst is
that it smoothes across sharp changes in gradients, blunting or blurring
ramp edges and valley or ridge-like features. The second is that high gradient or high curvature regions are poorly smoothed because most nearby
values are outliers that miss the lter window. The third is that it may
include disjoint domains or high gradient regions.
To solve all these problems, the bilateral lter was extended by Choudhury and Tumblin [36]. The new lter, called a trilateral lter, tilts and
skews the lter window using bilateral ltered image gradients I (see
Figure A.3). This is achieved by using as input for g the closeness of the

40

40

30

40

20

40

20

10
0
40

40

30

20
30

20

(a)

10

40

20

10
0
40

30

20
30

20

(b)

10
0
40

10

20
30

20

10

(c)

Figure A.3. A comparison between bilateral and trilateral lter. (a) The input
noisy signal. (b) The signal in (a) smoothed using bilateral lter. (c) The signal in
(a) smoothed using trilateral lter. Note that ramps and ridges are kept instead
of being smoothed as it happened in (b).

230

A. The Bilateral Filter

center value I(x) to a plane instead of the Euclidean distance of coordinates. This plane is dened as
P (x, y) = I(x) + I y,
where x are the coordinates of the point to lter, and y are the coordinates of a sample in the window. The only disadvantage of this lter is
the high computational cost because two bilateral lters are needed to be
calculatedone for gradients and another for the ltering image values.

B
Retinex Filters

The Retinex theory developed by Land [108] explains how the HVS extracts
reliable information from the real world when illumination changes appear
and, in general, acts as a model of the HVS for color constancy. This
theory is based on psychophysical experiments from which Land showed
that there is a correlation between the amount of radiation falling inside
the retina and the apparent lightness of a surface.
In this appendix, we show the basic Retinex lters as proposed by Rahman et al. [177]. The single-scale Retinex lter is the logarithm dierence
between the color channel and its convolved version by a Gaussian surround:
Ri (x) = log Ci (x) log(G (x) Ci (x)),
where Ci is the ith color channel of the input image, and G a Gaussian
kernel with standard deviation .
In the multiscale Retinex approach, more color bands are used, such as
Rm,i (x) =

N


wk Rk,i (x),

k=1

where N is the number of scales, and wk the weight of kth scale. The nal
image is computed multiplying Rm,i by the color restoration term, Ci :

(x) = Rm,i (x)Ci (x),
Rm,i

where Ci is dened as



Ci (x)
Ci (x) = f 3
.
i=1 Ci (x)

231

232

B. Retinex Filters

(a)

(b)

Figure B.1. An example of multiscale Retinex. (a) The original LDR image.
(b) The image is processed with a multiscale Retinex operator using eight scales.

The variable f is a linear or nonlinear function that controls the saturation


of the nal image. An example of a multiscale Retinex operator applied to
an LDR image is shown in Figure B.1.

C
A Brief Overview of the
MATLAB HDR Toolbox

This appendix describes how to use the HDR Toolbox used in this book.
This toolbox is self-contained, although it uses some functions from the Image Processing Toolbox by Mathworks [141]. Note that HDR built-in functions of Matlab such as hdrread.m, hdrwrite.m, and tonemap.m, need
Matlab version 2008b or above. Furthermore, Matlab version 2010a
and the respective Image Processing Toolbox are needed for compressing
images for HDR JPEG2000. The initial process for handling HDR images/frames in Matlab is to load them. The HDR Toolbox provides the
function hdrimread.m to read HDR images. This function takes as input
a Matlab string and outputs a m-n-3 matrix. An example how to launch
this functions is:
>> img = hdrimread(memorial.pfm);
Note that hdrimread.m can read portable float maps les (.pfm) and
uncompressed radiance les (.hdr/.pic). Moreover, this function can read
all supported LDR formats that are supported natively by Matlab and it
automatically stores them in the range [0, 1] with double precision. Matlab from version 2008b onwards provides support for reading radiance
les both compressed (using run-length encoding) and uncompressed. An
example of how to use this function for loading memorial.hdr is:
>> img = hdrread(memorial.hdr);
Once images are loaded into memory, a useful operation is to visualize
them. A simple operation that allows single-exposure images to be shown
is GammaTMO.m. This function applies gamma correction to an HDR image
at a given f-stop value and visualizes it on the screen. Note that values are
clamped between [0, 1]. For example, if we want to display an HDR image

233

234

C. A Brief Overview of the MATLAB HDR Toolbox

Figure C.1. The Memorial HDR image gamma corrected, with a setting of 2.2 for
display at f-stop 7. (The original HDR image is courtesy of Paul Debevec [50].)

at f-stop 7, with gamma correction 2.2, we have just to type the following
on the Matlab console:
>> GammaTMO(img, 2.2, -7, 1);
The result of this operation can be seen in Figure C.1. In the case we want
to save this gamma-corrected exposure into a matrix, we just need to set
the visualization ag to 0:
>> imgOut = GammaTMO(img, 2.2, -7, 0);
Gamma-corrected single-exposure images are a straightforward way to
view HDR images, but they do not permit the large range of luminance in
an HDR image to be properly viewed. The HDR toolbox provides several
TMOs that can be used to compress the luminance in order to be visualized
on an LDR monitor. For example, if we want to tone map an image using
Drago et al.s operator [60], the DragoTMO.m function is used and the image
is saved into a temporary image. Then, this image is visualized using the
GammaTMO.m function as shown before:

C. A Brief Overview of the MATLAB HDR Toolbox

235

Figure C.2. The Memorial HDR image tone mapped with Dragos TMO [60].
(The original HDR image is courtesy of Paul Debevec [50].)

>> imgTMO = DragoTMO(img);


>> GammaTMO(imgTMO, 2.2, 0, 1);
The result of this can be seen in Figure C.2. In the case the tone mapping
parameters are not satisfactory, they can be changed. Each TMO can be
queried with the command help that can be used to understand which
parameters can be set and what these parameters do. The help is called
using the command help nameFunction. We demonstrate the previous
example with the call to help. Figure C.3 shows the resulting image.
>> help DragoTMO;
>> imgTMO = DragoTMO(img, 100.0, 0.5);
>> GammaTMO(imgTMO, 2.2, 0, 1);
Furthermore, if colors in tone mapped images are too saturated due to
the fact that the range is compressed, they can be adjusted using the
ColorCorrection.m. This function increases or decreases saturation in
the image. In our case, the image needs to be desaturated. To achieve
that, we need to use a color correction value in [0, 1] (see Figure C.4).

236

C. A Brief Overview of the MATLAB HDR Toolbox

Figure C.3. The Memorial HDR image tone mapped with Dragos TMO [60]
changing the bias parameter to 0.5. (The original HDR image is courtesy of Paul
Debevec [50].)

>> imgTMO = DragoTMO(img, 100.0, 0.5);


>> imgTMOCor = ColorCorrection(imgTMO,1.0, 0.4);
>> GammaTMO(imgTMOCor, 2.2, 0, 1);
Expansion operators provide a tool for creating HDR content from LDR
images. Similarly to TMOs, expansion operators are provided with help
for checking parameters. For example, if we want to expand an LDR image
(peppers.jpg) with the Landis operator [109], we rstly need to load
it (hdrimread.m can be used because it casts 8-bit images into double
precision and maps them in [0, 1]), and then the operator can be executed
to expand the image with the desired parameters (after consulting help).
>> img = hdrimread(peppers.jpg);
>> help LandiEO
>> imgEXP = LandisEO(img, 2.3, 0.5, 10, 2.2);
EOs can reduce saturation during expansion, the opposite case of what
happens during tone mapping. In this case, the ColorCorrection.m can

C. A Brief Overview of the MATLAB HDR Toolbox

237

Figure C.4. The Memorial HDR image tone mapped with Dragos TMO [60]
followed by a color correction step. (The original HDR image is courtesy of Paul
Debevec [50].)

be used to increase the saturation using a greater than one correction value
such as:
>> imgEXP = LandisEO(img, 2.3, 0.5, 10, 2.2);
>> imgCor = ColorCorrection(imgEXP, 1.4);
Loaded, tone mapped, and expanded images at a certain point need to
be stored on the hard disk. The HDR Toolbox has a native function to write
.pfm and .hdr les (without compression), which is called hdrimwrite.m.
For instance, if we want to write on the drive an image as a .pfm le, we
just need to call hdrimwrite.m:
>> hdrimwrite(img,out.pfm);
If we want to store the image as an .hdr le, we just need to put the
appropriate le extension.
Note that Matlab provides a native function to write .hdr les with
compression, which is called hdrwrite.m:
>> hdrwrite(img,out.hdr);

238

C. A Brief Overview of the MATLAB HDR Toolbox

The HDR toolbox provides other functions for manipulating HDR images, including bilateral decomposition, histogram calculation, merging of
LDR images into HDR images, light source sampling, HDR compression,
etc. All these functions are straightforward to use (please see the functions
help for a description of the function, the parameters, and the outputs).

Bibliography

[1] Andrew Adams, Natasha Gelfand, Jennifer Dolson, and Marc Levoy. Gaussian KD-trees for Fast High-Dimensional Filtering. ACM Trans. Graph.
28:3 (2009), 112.
[2] Andrew Adams, Jongmin Baek, and Myers Abraham Davis. Fast HighDimensional Filtering Using the Permutohedral Lattice. Computer Graphics Forum 29:2 (2010), 753762.
[3] Ansel Adams. The Print: The Ansel Adams Photography Series 3. Cambridge, MA, USA: Little, Brown and Company, 1981.
[4] Edward H. Adelson and James R. Bergen. The Plenoptic Function and the
Elements of Early Vision. In Computational Models of Visual Processing,
pp. 320. Cambridge, MA, USA: MIT Press, 1991.
[5] Edward H. Adelson. Saturation and Adaptation in the Rod System. Vision Research 22 (1982), 12991312.
[6] Adobe. Adobe PhotoShop. Available at http://www.adobe.com/it/
products/photoshop/photoshop/, 2008.
[7] Sameer Agarwal, Ravi Ramamoorthi, Serge Belongie, and Henrik Wann
Jensen. Structured Importance Sampling of Environment Maps. ACM
Trans. on Graph. 22:3 (2003), 605612.
[8] Aseem Agarwala, Mira Dontcheva, Maneesh Agrawala, Steven Drucker, Alex
Colburn, Brian Curless, David Salesin, and Michael Cohen. Interactive
Digital Photomontage. ACM Trans. Graph. 23:3 (2004), 294302.
[9] Manoj Aggarwal and Narendra Ahuja. Split Aperture Imaging for High
Dynamic Range. Int. J. Comput. Vision 58:1 (2004), 717.
[10] Tomas Akenine-M
oller, Eric Haines, and Natty Homan. Real-Time Rendering, Third Edition. Natick, MA, USA: A K Peters, Ltd., 2008.
[11] Ahmet O
guz Aky
uz and Erik Reinhard. Color Appearance in HighDynamic-Range Imaging. Journal of Electronic Imaging 15:3 (2006),
033001103300112.

239

240

Bibliography

[12] Ahmet O
guz Aky
uz and Erik Reinhard. Noise Reduction in High Dynamic
Range Imaging. Journal of Visual Communication and Image Representation 18:5 (2007), 366376.
[13] Ahmet O
guz Aky
uz and Erik Reinhard. Perceptual Evaluation of ToneReproduction Operators Using the CornsweetCraikOBrien Illusion.
ACM Transaction on Applied Perception 4:4 (2008), 129.
[14] Ahmet O
guz Aky
uz, Roland Fleming, Bernhard E. Riecke, Erik Reinhard,
and Heinrich H. B
ultho. Do HDR Displays Support LDR Content?: A
Psychophysical Evaluation. ACM Trans. Graph. 26:3 (2007), 38.
[15] David Alleysson and Sabine S
usstrunk. On Adaptive Non-linearity for
Color Discrimination and Chromatic Adaptation. In Proceedings in the
First European Conf. on Color in Graphics, Image, and Vision, pp. 190195.
Poitiers, France: The Society for Imaging Science and Technology, 2002.
[16] Michael Ashikhmin and Jay Goyal. A Reality Check for Tone-Mapping
Operators. ACM Transaction on Applied Perception 3:4 (2006), 399411.
[17] Michael Ashikhmin. A Tone Mapping Algorithm for High Contrast Images. In EGRW 02: Proceedings of the 13th Eurographics Workshop on
Rendering, pp. 145156. Aire-la-Ville, Switzerland: Eurographics Association, 2002.
[18] Tunc Ozan Aydin, Rafal Mantiuk, Karol Myszkowski, and Hans-Peter Seidel. Dynamic Range Independent Image Quality Assessment. ACM Trans.
Graph. 27:3 (2008), 110.
[19] Francesco Banterle, Patrick Ledda, Kurt Debattista, and Alan Chalmers.
Inverse Tone Mapping. In GRAPHITE 06: Proceedings of the 4th International Conference on Computer Graphics and Interactive Techniques in
Australasia and Southeast Asia, pp. 349356. New York, NY, USA: ACM,
2006.
[20] Francesco Banterle, Patrick Ledda, Kurt Debattista, Alan Chalmers, and
Marina Bloj. A Framework for Inverse Tone Mapping. The Visual Computer 23:7 (2007), 467478.
[21] Francesco Banterle, Kurt Debattista, Patrick Ledda, and Alan Chalmers. A
GPU-Friendly Method for High Dynamic Range Texture Compression Using
Inverse Tone Mapping. In GI 08: Proceedings of Graphics Interface 2008,
pp. 4148. Toronto, Ontario, Canada: Canadian Information Processing
Society, 2008.
[22] Francesco Banterle, Patrick Ledda, Kurt Debattista, and Alan Chalmers.
Expanding Low Dynamic Range Videos for High Dynamic Range Applications. In SCCG 08: Proceedings of the 4th Spring Conference on Computer
Graphics, pp. 349356. New York, NY, USA: ACM, 2008.
[23] Francesco Banterle, Patrick Ledda, Kurt Debattista, Alessandro Artusi, Marina Bloj, and Alan Chalmers. A Psychophysical Evaluation of Inverse Tone
Mapping Techniques. Computer Graphics Forum 28:1 (2009), 1325.

Bibliography

241

[24] Francesco Banterle. Inverse Tone Mapping. Ph.D. thesis, University of


Warwick, 2009.
[25] Marcelo Bertalmio, Liminita A. Vese, Gullermo Sapiro, and Stanley J. Osher. Simultaneous Structure and Texture Image in Painting. IEEE Transactions on Image Processing 12:8 (2003), 882889.
[26] James F. Blinn and Martin E. Newell. Texture and Reection in Computer
Generated Images. In SIGGRAPH 76: Proceedings of the 3rd Annual
Conference on Computer Graphics and Interactive Techniques, pp. 266266.
New York, NY, USA: ACM, 1976.
[27] Gustav J. Braun and Mark D. Fairchild. Image Lightness Rescaling Using
Sigmoidal Contrast Enhancement Functions. Journal of Electronic Imaging
8 (1999), 380393.
[28] David Burke, Abhijeet Ghosh, and Wolfgang Heidrich. Bidirectional Importance Sampling for Direct Illumination. In Rendering Techniques 2005
Eurographics Symposium on Rendering, pp. 147156. Aire-la-Ville, Switzerland: Eurographics Association, 2005.
[29] Peter J. Burt and Edward H. Adelson. The Laplacian Pyramid as a Compact Image Code. In Readings in Computer Vision: Issues, Problems,
Principles, and Paradigms, pp. 671679. San Francisco, CA, USA: Morgan
Kaufmann Publishers Inc., 1987.
k, Michael Wimmer, Laszlo Neumann, and Alessandro Artusi.
[30] Martin Cad
Image Attributes and Quality for Evaluation of Tone Mapping Operators.
In Proceedings of the 14th Pacific Conference on Computer Graphics and
Applications, pp. 35 44. Taipei, Taiwan: National Taiwan University Press,
2006.
k, Michael Wimmer, Laszlo Neumann, and Alessandro Artusi.
[31] Martin Cad
Evaluation of HDR Tone Mapping Methods Using Essential Perceptual
Attributes. Computers & Graphics 32 (2008), 330349.
[32] Edwin Earl Catmull. A Subdivision Algorithm for Computer Display of
Curved Surfaces. Ph.D. thesis, University of Utah, 1974.
[33] Alan Chalmers, Gerhard Bonnet, Francesco Banterle, Piotr Dubla, Kurt Debattista, Alessandro Artusi, and Christopher Moir. High-Dynamic-Range
Video Solution. In SIGGRAPH ASIA 09: ACM SIGGRAPH ASIA 2009
Art Gallery & Emerging Technologies: Adaptation, pp. 7171. New York,
NY, USA: ACM, 2009.
[34] Jiawen Chen, Sylvain Paris, and Fredo Durand. Real-Time Edge-Aware
Image Processing with the Bilateral Grid. ACM Trans. Graph. 26:3 (2007),
103.
[35] Kenneth Chiu, M. Herf, Peter Shirley, S. Swamy, Changyaw Wang, and
Kurt Zimmerman. Spatially Nonuniform Scaling Functions for High Contrast Images. In Proceedings of Graphics Interface 93, pp. 245253. San
Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 1993.

242

Bibliography

[36] Prasun Choudhury and Jack Tumblin. The Trilateral Filter for High Contrast Images and Meshes. In EGRW 03: Proceedings of the 14th Eurographics Workshop on Rendering, pp. 186196. Aire-la-Ville, Switzerland:
Eurographics Association, 2003.
[37] Charilaos Christopoulos, Athanassios Skodras, and Touradj Ebrahimi. The
JPEG2000 Still Image Coding System: An Overview. IEEE Transactions
on Consumer Electronics 46:4 (2000), 11031127.
[38] CIE. Commission Internationale de lEclairage. Available at http://www.
cie.co.at, 2008.
[39] Petrik Clarberg and Tomas Akenine-M
oller. Exploiting Visibility Correlation in Direct Illumination. Computer Graphics Forum (Proceedings of
EGSR 2008) 27:4 (2008), 11251136.
[40] Petrik Clarberg and Tomas Akenine-M
oller. Practical Product Importance
Sampling for Direct Illumination. Computer Graphics Forum (Proceedings
of Eurographics 2008) 27:2 (2008), 681690.
[41] Petrik Clarberg, Wojciech Jarosz, Tomas Akenine-M
oller, and Henrik Wann
Jensen. Wavelet Importance Sampling: Eciently Evaluating Products of
Complex Functions. ACM Trans. Graph. 24:3 (2005), 11661175.
[42] Tom Cornsweet. Visual Perception. New York, NY, USA: Academic Press,
1970.
[43] Massimiliano Corsini, Marco Callieri, and Paolo Cignoni. Stereo Light
Probe. Computer Graphics Forum 27:2 (2008), 291300. Available online
(http://vcg.isti.cnr.it/Publications/2008/CCC08).
[44] Crytek. Crysis. Available at http://www.crysis-game.com/, 2008.
[45] CypressSemiconductor. LUPA 1300-2. Available at http://www.cypress.
com/, 2008.
[46] Scott Daly and Xiaofan Feng. Bit-Depth Extension Using Spatiotemporal
Microdither Based on Models of the Equivalent Input Noise of the Visual
System. In Proceedings of Color Imaging VIII: Processing, Hardcopy, and
Applications, pp. 455466. Bellingham, WA, USA: SPIE, 2003.
[47] Scott Daly and Xiaofan Feng. Decontouring: Prevention and Removal of
False Contour Artifacts. In Proceedings of Human Vision and Electronic
Imaging IX, pp. 130149. Bellingham, WA, USA: SPIE, 2004.
[48] Scott Daly. The Visible Dierences Predictor: An Algorithm for the Assessment of Image Fidelity. In Digital Images and Human Vision, pp. 179206.
Cambridge, MA, USA: MIT Press, 1993.
[49] Herbert A. David. The Method of Paired Comparisons, Second Edition.
Oxford, UK: Oxford University Press, 1988.
[50] Paul Debevec and Jitendra Malik. Recovering High Dynamic Range Radiance Maps from Photographs. In SIGGRAPH 97: Proceedings of the
24th Annual Conference on Computer Graphics and Interactive Techniques,
pp. 369378. New York, NY, USA: ACM Press/Addison-Wesley Publishing
Co., 1997.

Bibliography

243

[51] Paul Debevec and Erik Reinhard. High Dynamic Range Imaging: Theory
and Applications. In ACM SIGGRAPH 2006 Courses. New York, NY,
USA: ACM, 2006.
[52] Paul Debevec, Tim Hawkins, Chris Tchou, Haarm-Pieter Duiker, Westley
Sarokin, and Mark Sagar. Acquiring the Reectance Field of a Human
Face. In SIGGRAPH 00: Proceedings of the 27th Annual Conference on
Computer Graphics and Interactive Techniques, pp. 145156. New York, NY,
USA: ACM Press/Addison-Wesley Publishing Co., 2000.
[53] Paul Debevec. Rendering Synthetic Objects into Real Scenes: Bridging
Traditional and Image-Based Graphics with Global Illumination and High
Dynamic Range Photography. In SIGGRAPH 98: Proceedings of the
25th Annual Conference on Computer Graphics and Interactive Techniques,
pp. 189198. New York, NY, USA: ACM, 1998.
[54] Paul Debevec. A Median Cut Algorithm for Light Probe Sampling. In
SIGGRAPH 05: ACM SIGGRAPH 2005 Posters, p. 66. New York, NY,
USA: ACM, 2005.
[55] Paul Debevec. Virtual Cinematography: Relighting through Computation. Computer 39 (2006), 5765.
[56] Piotr Didyk, Rafal Mantiuk, Matthias Hein, and Hans-Peter Seidel. Enhancement of Bright Video Features for HDR Displays. Computer Graphics
Forum 27:4 (2008), 12651274.
[57] Dolby. Dolby-DR37P. Available at http://www.dolby.com/promo/hdr/
technology.html, 2008.
[58] Frederic Drago, William Martens, Karol Myszkowski, and Hans-Peter Seidel.
Perceptual Evaluation of Tone Mapping Operators with Regard to Similarity and Preference. Research Report MPI-I-2002-4-002, Max-PlanckInstitut f
ur Informatik, Stuhlsatzenhausweg 85, 66123 Saarbr
ucken, Germany, 2002.
[59] Frederic Drago, William Martens, Karol Myszkowski, and Norishige Chiba.
Design of a Tone Mapping Operator for High Dynamic Range Images Based
upon Psychophysical Evaluation and Preference Mapping. In Human Vision and Electronic Imaging VIII (HVEI-03), edited by Bernice Rogowitz
and Thrasyvoulos Pappas, pp. 321331. Santa Clara, USA: SPIE, 2003.
[60] Frederic Drago, Karol Myszkowski, Thomas Annen, and Norishige Chiba.
Adaptive Logarithmic Mapping for Displaying High Contrast Scenes.
Computer Graphics Forum 22:3 (2003), 419426.
[61] Fredo Durand and Julie Dorsey. Interactive Tone Mapping. In Proceedings
of the Eurographics Workshop on Rendering Techniques 2000, pp. 219230.
London, UK: Springer-Verlag, 2000.
[62] Fredo Durand and Julie Dorsey. Fast Bilateral Filtering for the Display of
High-Dynamic-Range Images. ACM Trans. Graph. 21:3 (2002), 257266.

244

Bibliography

[63] Mark D. Fairchild and Garrett M. Johnson. Meet iCAM: A NextGeneration Color Appearance Model. In The Tenth Color Imaging Conference, pp. 3338. Springeld, VA, USA: IS&T - The Society for Imaging
Science and Technology, 2002.
[64] Mark D. Fairchild. Color Appearance Models, Second Edition. New York,
NY, USA: Wiley-IS&T, 2005.
[65] Mark Fairchild. The HDR Photographic Survey. Available at http://www.
cis.rit.edu/fairchild/HDR.html, 2008.
[66] Hany Farid. Blind Inverse Gamma Correction. IEEE Transactions on
Image Processing 10:10 (2001), 14281433.
[67] Raanan Fattal, Dani Lischinski, and Michael Werman. Gradient Domain
High Dynamic Range Compression. ACM Trans. Graph. 21:3 (2002), 249
256.
[68] James A. Ferwerda, Sumanta N. Pattanaik, Peter Shirley, and Donald P.
Greenberg. A Model of Visual Adaptation for Realistic Image Synthesis.
In SIGGRAPH 96: Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques, pp. 249258. New York, NY,
USA: ACM, 1996.
[69] James A. Ferwerda, Peter Shirley, Sumanta N. Pattanaik, and Donald P.
Greenberg. A Model of Visual Masking for Computer Graphics. In
SIGGRAPH 97: Proceedings of the 24th Annual Conference on Computer
Graphics and Interactive Techniques, pp. 143152. New York, NY, USA:
ACM Press/Addison-Wesley Publishing Co., 1997.
[70] Brian Funt, Florian Ciurea, and John McCann. Retinex in Matlab. In
Proceedings of the IS&T/SID Eighth Color Imaging Conference: Color Science, Systems and Applications, pp. 112121. Scottsdale, AZ, USA: Society
for Imaging Science and Technology, 2000.
[71] Orazio Gallo, Natasha Gelfand, Wei-Chao Chen, Marius Tico, and Kari
Pulli. Artifact-Free High Dynamic Range Imaging. In IEEE International
Conference on Computational Photography (ICCP), pp. 17. Washington,
DC, USA: IEEE, 2009.
[72] Abhijeet Ghosh, Arnaud Doucet, and Wolfgang Heidrich. Sequential Sampling for Dynamic Environment Maps. In SIGGRAPH 06: ACM SIGGRAPH 2006 Sketches, p. 157. New York, NY, USA: ACM, 2006.
[73] Alan Gilchrist, Christos Kossydis, Frederick Bonato, Tiziano Agostini,
Xiaojun Li Joseph Cataliotti, Branka Spehar, Vidal Annan, and Elias
Economou. An Anchoring Theory of Lightness Perception. Psychological Review 106:4 (1999), 795834.
[74] Rafael C. Gonzalez and Richard E. Woods. Digital Image Processing.
Boston, MA, USA: Addison-Wesley Longman Publishing Co., Inc., 2001.
[75] Robin Green. Spherical Harmonics Lighting: The Gritty Details. In Game
Developers Conference, pp. 147, 2003.

Bibliography

245

[76] Ned Greene. Environment Mapping and Other Applications of World Projections. IEEE Computer Graphics and Applications 6:11 (1986), 2129.
[77] Michael D. Grossberg and Shree K. Nayar. Modeling the Space of Camera
Response Functions. IEEE Transactions on Pattern Analysis and Machine
Intelligence 26:10 (2004), 12721282.
[78] Stanford Graphics Group. The Stanford 3D Scanning Repository. Available at http://graphics.stanford.edu/data/3Dscanrep/, 2008.
[79] J. Hans Van Hateren and T. D. Lamb. The Photocurrent Response of
Human Cones Is Fast and Monophasic. BMC Neuroscience 7:34 (2006),
18.
[80] J. Hans Van Hateren. Encoding of High Dynamic Range Video with a
Model of Human Cones. ACM Trans. Graph. 25:4 (2006), 13801399.
[81] Vlastimil Havran, Kirill Dmitriev, and Hans-Peter Seidel. Goniometric
Diagram Mapping for Hemisphere. pp. 293300. Paper presented at Eurographics 2003, 2003.
[82] Vlastimil Havran, Miloslaw Smyk, Grzegorz Krawczyk, Karol Myszkowski,
and Hans-Peter Seidel. Importance Sampling for Video Environment
Maps. In Eurographics Symposium on Rendering 2005, edited by Kavita
Bala and Philip Dutre, pp. 3142, 311. Konstanz, Germany: ACM SIGGRAPH, 2005.
[83] Mary M. Hayhoe, Norma I. Benimo, and D. C. Hood. The Time Course
of Multiplicative and Subtractive Adaptation Process. Vision Research 27
(1987), 19811996.
[84] Donald Healy and O. Mitchell. Digital Video Bandwidth Compression
Using Block Truncation Coding. IEEE Transactions on Communications
29:12 (1981), 18091817.
[85] Berthold K. Horn. Determining Lightness from an Image. Computer
Graphics and Image Processing 3:1 (1974), 277299.
[86] David Hough. Applications of the Proposed IEEE-754 Standard for Floating Point Arithmetic. Computer 14:3 (1981), 7074.
[87] Eric Howlett. Wide-Angle Orthostereo. In Stereoscopic Displays and Applications, edited by John O. Merritt and Scott S. Fisher, pp. 210223. Santa
Clara, CA, USA: SPIE, 1990.
[88] Rober W. G. Hunt. The Reproduction of Colour. Kingston-upon-Thames,
England: Fountain Press Ltd, 1995.
[89] Industrial Light & Magic. OpenEXR. Available at http://www.openexr.
org, 2008.
[90] Konstantine Iourcha, Krishna Nayak, and Zhou Hong. System and Method
for Fixed-Rate Block-Based Image Compression with Inferred Pixel Values.
Patent no. 5,956,431, 1997.

246

Bibliography

[91] Piti Irawan, James A. Ferwerda, and Stephen R. Marschner. Perceptually


Based Tone Mapping of High Dynamic Range Image Streams. In Proceedings of the 16th Eurographics Symposium on Rendering, edited by Oliver
Deussen, Alexander Keller, Kavita Bala, Philip Dutr, Dieter W. Fellner,
and Stephen N. Spencer, pp. 231242. Konstanz, Germany: Eurographics
Association, 2005.
[92] ITU. ITU-R BT.709, Basic Parameter Values for the HDTV Standard
for the Studio and for International Programme Exchange. In Standard
Recommendation 709, International Telecommunication Union, 1990.
[93] Kei Iwasaki, Yoshinori Dobashi, Fujiichi Yoshimoto, and Tomoyuki Nishita.
Precomputed Radiance Transfer for Dynamic Scenes Taking into Account
Light Interreection. In Rendering Techniques 2007: 18th Eurographics
Workshop on Rendering, pp. 3544. Aire-la-Ville, Switzerland: Eurographics
Association, 2007.
[94] Katrien Jacobs, Celine Loscos, and Greg Ward. Automatic High-Dynamic
Range Image Generation for Dynamic Scenes. IEEE Comput. Graph. Appl.
28:2 (2008), 8493.
[95] James T. Kajiya. The Rendering Equation.
Graphics 20:4 (1986), 143150.

SIGGRAPH Computer

[96] Sing Bing Kang, Matthew Uyttendaele, Simon Winder, and Richard Szeliski.
High Dynamic Range Video. ACM Trans. Graph. 22:3 (2003), 319325.
[97] Maurice Kendall. Rank Correlation Methods, Fourth Edition. Baltimore,
MD, USA: Grin Ltd., 1975.
[98] Erum A. Khan, Ahmet O
guz Aky
uz, and Erik Reinhard. Ghost Removal in
High Dynamic Range Images. In IEEE International Conference on Image
Processing, pp. 20052008. Washington, DC, USA: IEEE, 2006.
[99] Mark Kilgard, Pat Brown, and Jon Leech. GL EXT texture shared
exponent. In OpenGL Extension. Available at http://www.opengl.org/
registry/specs/EXT/texture shared exponent.txt, 2007.
[100] Gunter Knittel, Andreas Schilling, Anders Kugler, and Wolfgang Strasser.
Hardware for Superior Texture Performance. Computers & Graphics 20:4
(1996), 475481.
[101] Thomas Kollig and Alexander Keller. Ecient Illumination by High Dynamic Range Images. In EGRW 03: Proceedings of the 14th Eurographics
Workshop on Rendering, pp. 4550. Aire-la-Ville, Switzerland: Eurographics
Association, 2003.
[102] Johannes Kopf, Michael F. Cohen, Dani Lischinski, and Matt Uyttendaele.
Joint Bilateral Upsampling. ACM Trans. Graph. 26:3 (2007), 96.
[103] Rafael Pacheco Kovaleski and Manuel M. Oliveira. High-Quality Brightness Enhancement Functions for Real-Time Reverse Tone Mapping. Vis.
Comput. 25:57 (2009), 539547.
[104] Grzegorz Krawczyk, Karol Myszkowski, and Hans-Peter Seidel. Lightness
Perception in Tone Reproduction for High Dynamic Range Images. In

Bibliography

247

The European Association for Computer Graphics 26th Annual Conference


EUROGRAPHICS 2005, pp. 635645. Dublin, Ireland: Blackwell, 2005.
[105] Jiangtao Kuang, Hiroshi Yamaguchi, Garrett M. Johnson, and Mark D.
Fairchild. Testing HDR Image Rendering Algorithms. In IS&T/SID 12th
Color Imaging Conference, pp. 315320. Scottsdale, AZ, USA: SPIE, 2004.
[106] Jiangtao Kuang, Garrett M. Johnson, and Mark D. Fairchild. iCAM06:
A Rened Image Appearance Model for HDR Image Rendering. J. Vis.
Comun. Image Represent. 18:5 (2007), 406414.
[107] Jiangtao Kuang, Hiroshi Yamaguchi, Changmeng Liu, Garrett M. Johnson,
and Mark D. Fairchild. Evaluating HDR Rendering Algorithms. ACM
Transaction on Applied Perception 4:2 (2007), 9.
[108] Edwin Land. Recent Advances in Retinex Theory. Vision Research 19:1
(1986), 721.
[109] Hayden Landis. Production-Ready Global Illumination. In SIGGRAPH
Course Notes 16, pp. 87101, 2002.
[110] Gregory Ward Larson, Holly Rushmeier, and Christine Piatko. A Visibility Matching Tone Reproduction Operator for High Dynamic Range Scenes.
IEEE Transactions on Visualization and Computer Graphics 3:4 (1997),
291306.
[111] Gregory Ward Larson. LogLuv Encoding for Full-Gamut, High-Dynamic
Range Images. Journal of Graphics Tools 3:1 (1998), 1531.
[112] Patrick Ledda, Greg Ward, and Alan Chalmers. A Wide Field, High
Dynamic Range, Stereographic Viewer. In GRAPHITE 03: Proceedings
of the 1st International Conference on Computer Graphics and Interactive
Techniques in Australasia and South East Asia, pp. 237244. New York, NY,
USA: ACM, 2003.
[113] Patrick Ledda, Luis Paulo Santos, and Alan Chalmers. A Local Model
of Eye Adaptation for High Dynamic Range Images. In AFRIGRAPH
04: Proceedings of the 3rd International Conference on Computer Graphics,
Virtual Reality, Visualisation and Interaction in Africa, pp. 151160. New
York, NY, USA: ACM, 2004.
[114] Patrick Ledda, Alan Chalmers, Tom Troscianko, and Helge Seetzen. Evaluation of Tone Mapping Operators Using a High Dynamic Range Display.
In SIGGRAPH 05: ACM SIGGRAPH 2005 Papers, pp. 640648. New York,
NY, USA: ACM, 2005.
[115] Chu Lee and Chang-Su Kim. Gradient Domain Tone Mapping of High
Dynamic Range Videos. In ICIP07, pp. 461464, 2007.
[116] Chu Lee and Chang-Su Kim. Rate-Distortion Optimized Compression of
High Dynamic Range Videos. In 16th European Signal Processing Conference (EUSIPCO 2008), pp. 461464, 2008.
[117] Anat Levin, Dani Lischinski, and Yair Weiss. Colorization Using Optimization. ACM Trans. Graph. 23:3 (2004), 689694.

248

Bibliography

[118] Marc Levoy and Pat Hanrahan. Light Field Rendering. In SIGGRAPH
96: Proceedings of the 23rd Annual Conference on Computer Graphics and
Interactive Techniques, pp. 3142. New York, NY, USA: ACM, 1996.
[119] Yuanzhen Li, Lavanya Sharan, and Edward H. Adelson. Compressing and
Companding High Dynamic Range Images with Subband Architectures.
ACM Trans. Graph. 24:3 (2005), 836844.
[120] Shigang Li. Real-Time Spherical Stereo. In ICPR 06: Proceedings of
the 18th International Conference on Pattern Recognition, pp. 10461049.
Washington, DC, USA: IEEE Computer Society, 2006.
[121] Stephen Lin and Lei Zhang. Determining the Radiometric Response Function from a Single Grayscale Image. In CVPR 05: Proceedings of the
2005 IEEE Computer Society Conference on Computer Vision and Pattern
Recognition (CVPR05), pp. 6673. Washington, DC, USA: IEEE Computer
Society, 2005.
[122] Stephen Lin, Jinwei Gu, Shuntaro Yamazaki, and Heung-Yeung Shum.
Radiometric Calibration from a Single Image. In CVPR 2004: Proceedings
of the 2004 IEEE Conference on Computer Vision and Pattern Recognition
(CVPR2004), pp. 938945. Washington, DC, USA: IEEE Computer Society,
2004.
[123] Dani Lischinski, Zeev Farbman, Matt Uyttendaele, and Richard Szeliski.
Interactive Local Adjustment of Tonal Values. ACM Trans. Graph. 25:3
(2006), 646653.
[124] Xiaopei Liu, Liang Wan, Yingge Qu, Tien-Tsin Wong, Stephen Lin, ChiSing Leung, and Pheng-Ann Heng. Intrinsic Colorization. ACM Trans.
Graph. 27:5 (2008), 19.
[125] E3D Creative LLC. E3D Stereo Rig. Available at http://e3dcreative.
com/, 2010.
[126] Stuart P. Lloyd. Least Squares Quantization in PCM. IEEE Transactions
on Information Theory 28:2 (1982), 129137.
[127] Jerey Lubin. A Visual Discrimination Model for Imaging System Design and Evaluation, pp. 245283. River Edge, NJ, USA: World Scientic
Publishers, 1995.
[128] Thomas Luft, Carsten Colditz, and Oliver Deussen. Image Enhancement
by Unsharp Masking the Depth Buer. ACM Trans. Graph. 25:3 (2006),
12061213.
[129] Max Lyons. Max Lyonss HDR Images Gallery. Available at http://www.
tawbaware.com/maxlyons/, 2008.
[130] Basil Mahon. The Man Who Changed EverythingThe Life of James Clerk
Maxwell. New York, NY, USA: John Wiley & Sons Ltd., 2004.
[131] Steve Mann and Rosalind W. Picard. Being Undigital with Digital
Cameras: Extending Dynamic Range by Combining Dierently Exposed
Pictures. In Proceedings of IS&T 48th Annual Conference, pp. 422428.
Society for Imaging Science and Technology, 1995.

Bibliography

249

[132] Rafal Mantiuk and Hans-Peter Seidel. Modeling a Generic Tone-Mapping


Operator. Computer Graphics Forum 27:2 (2008), 699708.
[133] Rafal Mantiuk, Grzegorz Krawczyk, Karol Myszkowski, and Hans-Peter
Seidel. Perception-Motivated High Dynamic Range Video Encoding. ACM
Trans. Graph. 23:3 (2004), 733741.
[134] Rafal Mantiuk, Karol Myszkowski, and Hans-Peter Seidel. Visible Dierence Predicator for High Dynamic Range Images. In Proceedings of IEEE
International Conference on Systems, Man and Cybernetics, pp. 27632769,
2004.
[135] Rafal Mantiuk, Scott Daly, Karol Myszkowski, and Hans-Peter Seidel.
Predicting Visible Dierences in High Dynamic Range ImagesModel and
Its Calibration. In Human Vision and Electronic Imaging X, IST SPIEs
17th Annual Symposium on Electronic Imaging, edited by Bernice E. Rogowitz, Thrasyvoulos N. Pappas, and Scott J. Daly, pp. 204214. SPIE,
2005.
[136] Rafal Mantiuk, Alexander Efremov, Karol Myszkowski, and Hans-Peter
Seidel. Backward Compatible High Dynamic Range MPEG Video Compression. ACM Trans. Graph. 25:3 (2006), 713723.
[137] Rafal Mantiuk, Scott Daly, and Louis Kerofsky. Display Adaptive Tone
Mapping. ACM Trans. Graph. 27:3 (2008), 110.
[138] Radoslaw Mantiuk, Rafal Mantiuk, Anna Tomaszweska, and Wolfgang Heidrich. Color Correction for Tone Mapping. Proceedings of Eurographics
2009 28:2 (2009), 193202.
[139] Belen Masia, Sandra Agustin, Roland W. Fleming, Olga Sorkine, and Diego
Gutierrez. Evaluation of Reverse Tone Mapping Through Varying Exposure
Conditions. ACM Trans. Graph. 28:5 (2009), 18.
[140] George Mather. Foundations of Perception, First Edition. Hove, East
Sussex, UK: Psychology Press, 2006.
[141] Mathworks. Image Processing Toolbox.
mathworks.com/products/image/, 2010.

Available at http://www.

[142] Jerry M. Mendel. Tutorial on Higher-Order Statistics (Spectra) in Signal Processing and System Theory: Theoretical Results and Some Applications. Proceedings of the IEEE 79:3 (1991), 278305.
[143] Tom Mertens, Jan Kautz, and Frank Van Reeth. Exposure Fusion. In PG
07: Proceedings of the 15th Pacific Conference on Computer Graphics and
Applications, pp. 382390. Washington, DC, USA: IEEE Computer Society,
2007.
[144] Laurence Meylan and Sabine S
usstrunk. High Dynamic Range Image
Rendering with a Retinex-Based Adaptive Filter. IEEE Transactions on
Image Processing 15:9 (2006), 28202830.
[145] Laurence Meylan, Scott Daly, and Sabine S
usstrunk. The Reproduction
of Specular Highlights on High Dynamic Range Displays. In IST/SID 14th
Color Imaging Conference, pp. 333338. Scottsdale, AZ, USA, 2006.

250

Bibliography

[146] Laurence Meylan, Scott Daly, and Sabine S


usstrunk. Tone Mapping for
High Dynamic Range Displays. In Proc. IS&T/SPIE Electronic Imaging:
Human Vision and Electronic Imaging XII. SPIE, 2007.
[147] Microsoft Corporation. DirectX. Available at http://msdn.microsoft.
com/en-us/directx/default.aspx, 2010.
[148] Gene Miller and C. Robert Homan. Illumination and Reection Maps:
Simulated Objects in Simulated and Real Environments. In Siggraph 84
Advanced Computer Graphics Animation Seminar Note. New York, NY,
USA: ACM Press, 1984.
[149] Tomoo Mitsunaga and Shree K. Nayar. Radiometric Self Calibration.
IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR) 1 (1999), 1374.
[150] Moving Picture Experts Group. ISO/IEC 14496-2: 1999 (MPEG-4, Part
2). In ISO, 1999.
[151] Jacob Munkberg, Petrik Clarberg, Jon Hasselgren, and Tomas AkenineM
oller. High Dynamic Range Texture Compression for Graphics Hardware. ACM Trans. Graph. 25:3 (2006), 698706.
[152] Ken-Ichi Naka and William A. H. Rushton. S-potentials from Luminosity
Units in the Retina of Fish (Cyprinidae). Journal of Physiology 185:3
(1966), 587599.
[153] Shree K. Nayar and Vlad Branzoi. Adaptive Dynamic Range Imaging:
Optical Control of Pixel Exposures Over Space and Time. In ICCV 03:
Proceedings of the Ninth IEEE International Conference on Computer Vision, pp. 11681175. Washington, DC, USA: IEEE Computer Society, 2003.
[154] Laszlo Neumann, Kreimir Matkovic, and Werner Purgathofer. Automatic
Exposure in Computer Graphics Based on the Minimum Information Loss
Principle. In CGI 98: Proceedings of the Computer Graphics International
1998, p. 666. Washington, DC, USA: IEEE Computer Society, 1998.
[155] NeuriCam. NC1805Pupilla. Available at http://www.neuricam.com/,
2008.
[156] NextLimits. Maxwell Render. Available at http://www.maxwellrender.
com/, 2008.
[157] Ren Ng, Ravi Ramamoorthi, and Pat Hanrahan. All-Frequency Shadows
Using Non-linear Wavelet Lighting Approximation. ACM Trans. Graph.
22:3 (2003), 376381.
[158] Ren Ng, Ravi Ramamoorthi, and Pat Hanrahan. Triple Product Wavelet
Integrals for All-Frequency Relighting. ACM Trans. Graph. 23:3 (2004),
477487.
[159] Masahiro Okuda and Nicola Adami. Two-Layer Coding Algorithm for
High Dynamic Range Images Based on Luminance Compensation. J. Vis.
Comun. Image Represent. 18:5 (2007), 377386.
[160] Omrom. FZ3 Series. Available at http://www.ia.omron.com/, 2008.

Bibliography

251

[161] Victor Ostromoukhov, Charles Donohue, and Pierre-Marc Jodoin. Fast


Hierarchical Importance Sampling with Blue Noise Properties. ACM Trans.
Graph. 23:3 (2004), 488495.
[162] Minghao Pan, Rui Wang, Xinguo Liu, Qunsheng Peng, and Hujun Bao.
Precomputed Radiance Transfer Field for Rendering Interreections in Dynamic Scenes. Computer Graphics Forum 26:3 (2007), 485493.
[163] Panavision. Genesis Digital Camera System. Available at http://www.
panavision.com/, 2010.
[164] Panoscan. Panoscan MK-3. Available at http://www.panoscan.com/,
2008.
[165] Sung Ho Park and Ethan D. Montag. Evaluating Tone Mapping Algorithms for Rendering Non-pictorial (Scientic) High-Dynamic-Range Images. J. Vis. Comun. Image Represent. 18:5 (2007), 415428.
[166] Sumanta N. Pattanaik and Hector Yee. Adaptive Gain Control for High
Dynamic Range Image Display. In SCCG 02: Proceedings of the 18th
Spring Conference on Computer Graphics, pp. 8387. New York, NY, USA:
ACM, 2002.
[167] Sumanta N. Pattanaik, James A. Ferwerda, Mark D. Fairchild, and Donald P. Greenberg. A Multiscale Model of Adaptation and Spatial Vision for
Realistic Image Display. In SIGGRAPH 98: Proceedings of the 25th Annual Conference on Computer Graphics and Interactive Techniques, pp. 287
298. New York, NY, USA: ACM, 1998.
[168] Sumanta N. Pattanaik, Jack Tumblin, Hector Yee, and Donald P. Greenberg. Time-Dependent Visual Adaptation for Fast Realistic Image Display. In SIGGRAPH 00: Proceedings of the 27th Annual Conference on
Computer Graphics and Interactive Techniques, pp. 4754. New York, NY,
USA: ACM Press/Addison-Wesley Publishing Co., 2000.
[169] Fabio Pellacini, James A. Ferwerda, and Donald P. Greenberg. Toward
a Psychophysically-Based Light Reection Model for Image Synthesis. In
SIGGRAPH 00: Proceedings of the 27th Annual Conference on Computer
Graphics and Interactive Techniques, pp. 5564. New York, NY, USA: ACM
Press/Addison-Wesley Publishing Co., 2000.
[170] Patrick Perez, Michel Gangnet, and Andrew Blake. Poisson Image Editing. ACM Trans. Graph. 22:3 (2003), 313318.
[171] Ken Perlin and Eric M. Hoert. Hypertexture. In Computer Graphics
(Proceedings of ACM SIGGRAPH 89), pp. 253262. New York, NY, USA:
ACM, 1989.
[172] Georg Petschnigg, Richard Szeliski, Maneesh Agrawala, Michael Cohen,
Hugues Hoppe, and Kentaro Toyama. Digital Photography with Flash and
No-Flash Image Pairs. ACM Trans. Graph. 23:3 (2004), 664672.
[173] Tuan Q. Pham and Luca J. van Vliet. Separable Bilateral Filtering for Fast
Video Preprocessing. pp. 14. Los Alamitos, CA, USA: IEEE Computer
Society, 2005.

252

Bibliography

[174] Matt Pharr and Greg Humphreys. Physically Based Rendering: From Theory to Implementation. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 2004.
[175] William H. Press, Saul A. Teukolsky, William T. Vetterling, and Brian P.
Flannery. Numerical Recipes, Third Edition: The Art of Scientific Computing. Cambridge, UK: Cambridge University Press, 2007.
[176] PtGreyResearch. Firey MV. Available at http://www.ptgrey.com/,
2008.
[177] Zia ur Rahman, Daniel J. Jobson, and Glenn A. Woodell. Multi-Scale
Retinex for Color Image Enhancement. In Proceedings of the International
Conference on Image Processing, pp. 10031006. Lausanne, Switzerland:
IEEE, 1996.
[178] Ravi Ramamoorthi and Pat Hanrahan. An Ecient Representation for
Irradiance Environment Maps. In SIGGRAPH 01: Proceedings of the
28th Annual Conference on Computer Graphics and Interactive Techniques,
pp. 497500. New York, NY, USA: ACM, 2001.
[179] RedCompany. Red One. Available at http://www.red.com/, 2008.
[180] Erik Reinhard, Michael Stark, Peter Shirley, and James Ferwerda. Photographic Tone Reproduction for Digital Images. ACM Trans. Graph. 21:3
(2002), 267276.
[181] Erik Reinhard. Parameter Estimation for Photographic Tone Reproduction. Journal Graphics Tools 7:1 (2002), 4552.
[182] Allan G. Rempel, Matthew Trentacoste, Helge Seetzen, H. David Young,
Wolfgang Heidrich, Lorne Whitehead, and Greg Ward. LDR2HDR: Onthe-Fly Reverse Tone Mapping of Legacy Video and Photographs. ACM
Trans. Graph. 26:3 (2007), 39.
[183] Lawrence Roberts. Picture Coding Using Pseudo-random Noise. IEEE
Transactions on Information Theory 8:2 (1962), 145154.
[184] Mark A. Robertson, Sean Borman, and Robert L. Stevenson. Dynamic
Range Improvement Through Multiple Exposures. In Proceedings of the
1999 International Conference on Image Processing (ICIP-99), pp. 159163.
Los Alamitos, CA, USA: IEEE, 1999.
[185] Mark A. Robertson, Sean Borman, and Robert L. Stevenson. EstimationTheoretic Approach to Dynamic Range Enhancement Using Multiple Exposures. Journal of Electronic Imaging 12:2 (2003), 219228.
[186] Kimmo Roimela, Tomi Aarnio, and Joonas It
aranta. High Dynamic Range
Texture Compression. ACM Trans. Graph. 25:3 (2006), 707712.
[187] Kimmo Roimela, Tomi Aarnio, and Joonas It
aranta. Ecient High Dynamic Range Texture Compression. In SI3D 08: Proceedings of the 2008
Symposium on Interactive 3D Graphics and Games, pp. 207214. New York,
NY, USA: ACM, 2008.

Bibliography

253

[188] Imari Sato, Yoichi Sato, and Katsushi Ikeuchi. Acquiring a Radiance
Distribution to Superimpose Virtual Objects onto a Real Scene. IEEE
Transactions on Visualization and Computer Graphics 5:1 (1999), 112.
[189] Christophe Schlick. Quantization Techniques for Visualization of High
Dynamic Range Pictures. In Proceeding of the Fifth Eurographics Workshop
on Rendering, pp. 718, 1994.
[190] Helge Seetzen, Wolfgang Heidrich, Wolfgang Stuerzlinger, Greg Ward,
Lorne Whitehead, Matthew Trentacoste, Abhijeet Ghosh, and Andrejs
Vorozcovs. High Dynamic Range Display Systems. ACM Trans. Graph.
23:3 (2004), 760768.
[191] Peter-Pike Sloan, Jan Kautz, and John Snyder. Precomputed Radiance
Transfer for Real-Time Rendering in Dynamic, Low-Frequency Lighting Environments. ACM Trans. Graph. 21:3 (2002), 527536.
[192] Spheron. Spheron HDR VR. Available at http://www.spheron.com/,
2008.
[193] Stanley S. Stevens and J.C. Stevens. Brightness Function: Parametric
Eects of Adaptation and Contrast. Journal Optical Society of America
50:11 (1960), 1139.
[194] J.C. Stevens and Stanley S. Stevens. Brightness Function: Eects of
Adaptation. Journal Optical Society of America 53:3 (1963), 375385.
[195] Michael Stokes, Matthew Anderson, Srinivasan Chandrasekar, and Ricardo
Motta. A Standard Default Color Space for the InternetsRGB. Available at http://www.w3.org/Graphics/Color/sRGB.html, 1996.
[196] Eric J. Stollnitz, Tony D. DeRose, and David H. Salesin. Wavelets for
Computer Graphics: A Primer. IEEE Comput. Graph. Appl. 15:3 (1995),
7684.
[197] Wen Sun, Yan Lu, Feng Wu, and Shipeng Li. DHTC: An Eective DXTCbased HDR Texture Compression Scheme. In GH 08: Proceedings of
the 23rd ACM Siggraph/Eurographics Symposium on Graphics Hardware,
pp. 8594. Aire-la-Ville, Switzerland: Eurographics Association, 2008.
[198] Justin Talbot, David Cline, and Parris K. Egbert. Importance Resampling
for Global Illumination. In Rendering Techniques 2005 Eurographics Symposium on Rendering, pp. 139146. Aire-la-Ville, Switzerland: Eurographics
Association, 2005.
[199] Chris Tchou, Jessi Stumpfel, Per Einarsson, Marcos Fajardo, and Paul
Debevec. Unlighting the Parthenon. In SIGGRAPH 04: ACM Siggraph
2004 Sketches, p. 80. New York, NY, USA: ACM, 2004.
[200] ThomsonGrassValley. Viper FilmStream.
thomsongrassvalley.com/, 2008.

Available at http://www.

[201] Carlo Tomasi and Roberto Manduchi. Bilateral Filtering for Gray and
Color Images. In ICCV 98: Proceedings of the Sixth International Conference on Computer Vision, p. 839. Washington, DC, USA: IEEE Computer
Society, 1998.

254

Bibliography

[202] Jack Tumblin and Holly Rushmeier. Tone Reproduction for Realistic
Images. IEEE Comput. Graph. Appl. 13:6 (1993), 4248.
[203] Jack Tumblin and Greg Turk. LCIS: A Boundary Hierarchy for DetailPreserving Contrast Reduction. In SIGGRAPH 99: Proceedings of the
26th Annual Conference on Computer Graphics and Interactive Techniques,
pp. 8390. New York, NY, USA: ACM Press/Addison-Wesley Publishing
Co., 1999.
[204] Jack Tumblin, Jessica K. Hodgins, and Brian K. Guenter. Two Methods
for Display of High Contrast Images. ACM Trans. Graph. 18:1 (1999),
5694.
[205] Jonas Unger and Stefan Gustavson. High Dynamic Range Video for Photometric Measurement of Illumination. In Proceedings of Sensors, Cameras
and Systems for Scientific/Industrial Applications X, IS&T/SPIE 19th Inernational Symposium on Electronic Imaging. SPIE, 2007.
[206] Jonas Unger, Anders Wenger, Tim Hawkins, A. Gardner, and Paul Debevec. Capturing and Rendering with Incident Light Fields. In EGRW
03: Proceedings of the 14th Eurographics Workshop on Rendering, pp. 141
149. Aire-la-Ville, Switzerland: Eurographics Association, 2003.
[207] Jonas Unger, Stefan Gustavson, Per Larsson, and Anders Ynnerman. Free
Form Incident Light Fields. Computer Graphics Forum 27:4 (2008), 1293
1301.
[208] Vladimir N. Vapnik. The Nature of Statistical Learning Theory. New York,
NY, USA: Springer-Verlag, 1995.
[209] Eric Veach and Leonidas J. Guibas. Optimally Combining Sampling Techniques for Monte Carlo Rendering. In SIGGRAPH 95: Proceedings of the
22nd Annual Conference on Computer Graphics and Interactive Techniques,
pp. 419428. New York, NY, USA: ACM, 1995.
[210] Silicon Vision. Silicon Vision Lars III. Available at http://www.si-vision.
com/, 2010.
[211] VisionResearch. Phantom HD. Available at http://www.visionresearch.
com/, 2008.
[212] Ingo Wald, William R. Mark, Johannes G
unther, Solomon Boulos, Thiago
Ize, Warren A. Hunt, Steven G. Parker, and Peter Shirley. State of the
Art in Ray Tracing Animated Scenes. Comput. Graph. Forum 28:6 (2009),
16911722.
[213] Jan Walraven and J. Mathe Valeton. Visual Adaptation and Response
Saturation. In Limits in Perception, edited by W. A. Van de Grind and
J. J. Koenderink, pp. 401429. The Netherlands: VNU Science Press, 1984.
[214] Bruce Walter, Sebastian Fernandez, Adam Arbree, Kavita Bala, Michael
Donikian, and Donald P. Greenberg. Lightcuts: A Scalable Approach to
Illumination. ACM Trans. Graph. 24:3 (2005), 10981107.

Bibliography

255

[215] Liang Wan, Tien-Tsin Wong, and Chi-Sing Leung. Spherical Q2-tree for
Sampling Dynamic Environment Sequences. In Proceedings of Eurographics
Symposium on Rendering, pp. 2130. Aire-la-Ville, Switzerland: Eurographics Association, 2005.
[216] Zhou Wang and Alan Bovik. A Universal Image Quality Index. IEEE
Signal Processing Letters 9:3 (2002), 8184.
[217] Lvdi Wang, Xi Wang, Peter-Pike Sloan, Li-Yi Wei, Xin Tong, and Baining Guo. Rendering from Compressed High Dynamic Range Textures on
Programmable Graphics Hardware. In I3D 07: Proceedings of the 2007
Symposium on Interactive 3D Graphics and Games, pp. 1724. New York,
NY, USA: ACM, 2007.
[218] Lvdi Wang, Li-Yi Wei, Kun Zhou, Baining Guo, and Heung-Yeung Shum.
High Dynamic Range Image Hallucination. In SIGGRAPH 07: ACM
SIGGRAPH 2007 Sketches, p. 72. New York, NY, USA: ACM, 2007.
[219] Greg Ward and Maryann Simmons. Subband Encoding of High Dynamic
Range Imagery. In APGV 04: Proceedings of the 1st Symposium on Applied Perception in Graphics and Visualization, pp. 8390. New York, NY,
USA: ACM Press, 2004.
[220] Greg Ward and Maryann Simmons.
JPEG-HDR: A BackwardsCompatible, High Dynamic Range Extension to JPEG. In SIGGRAPH
05: ACM SIGGRAPH 2005 Courses, p. 2. New York, NY, USA: ACM,
2005.
[221] Greg Ward. Real Pixels. Graphics Gems 2 (1991), 1531.
[222] Greg Ward. A Contrast-Based Scalefactor for Luminance Display. Boston,
MA, USA: Academic Press, 1994.
[223] Greg Ward. The Radiance Lighting Simulation and Rendering System. In
SIGGRAPH 94: Proceedings of the 21st Annual Conference on Computer
Graphics and Interactive Techniques, pp. 459472. New York, NY, USA:
ACM, 1994.
[224] Greg Ward. A Wide Field, High Dynamic Range, Stereographic Viewer.
In Proceeding of PICS 2002. Portland, OR, USA, 2002.
[225] Greg Ward. Greg Wards HDR Images Gallery. Available at http://www.
anyhere.com/gward/, 2008.
[226] Andrew B. Watson and Joshua A. Solomon. Model of Visual Contrast
Gain Control and Pattern Masking. Journal of the Optical Society of America 14:9 (1997), 23792391.
[227] Andrew B. Watson. Temporal Sensitivity. In Handbook of Perception
and Human Performance, Volume I, pp. 61643. New York, NY, USA:
John Wiley & Sons, 1986.
[228] Andrew B. Watson. The Cortex Transform: Rapid Computation of Simulated Neural Images. Comput. Vision Graph. Image Process. 39:3 (1987),
311327.

256

Bibliography

[229] Weiss AG. Civetta HDR. Available at http://www.weiss-ag.us/, 2010.


[230] Ben Weiss. Fast Median and Bilateral Filtering. ACM Trans. Graph.
25:3 (2006), 519526.
[231] Tomihisa Welsh, Michael Ashikhmin, and Klaus Mueller. Transferring
Color to Greyscale Images. ACM Trans. Graph. 21:3 (2002), 277280.
[232] Turner Whitted. An Improved Illumination Model for Shaded Display.
In SIGGRAPH 79: Proceedings of the 6th Annual Conference on Computer
Graphics and Interactive Techniques, p. 14. New York, NY, USA: ACM,
1979.
[233] Thomas Wiegand, Gary J. Sullivan, Gisle Bjontegaard, and Ajay Luthra.
Overview of the H.264/AVC Video Coding Standard. IEEE Transactions
on Circuits and Systems for Video Technology 13:7 (2003), 560576.
[234] Lance Williams. Casting Curved Shadows on Curved Surfaces. In SIGGRAPH 78: Proceedings of the 5th Annual Conference on Computer Graphics and Interactive Techniques, pp. 270274. New York, NY, USA: ACM,
1978.
[235] Hugh R. Wilson. Psychophysical Models of Spatial Vision and Hyperacuity. In Vision and Visual Dysfunction, pp. 6481. Boca Raton, FL, USA:
CRC Press, 1991.
[236] Ruifeng Xu, Sumanta N. Pattanaik, and Charles E. Hughes. HighDynamic-Range Still-Image Encoding in JPEG 2000. IEEE Computer
Graphics and Applications 25:6 (2005), 5764.
[237] Xvid. Xvid. Available at http://www.xvid.com/, 2010.
[238] Hector Yee and Sumanta N. Pattanaik. Segmentation and Adaptive Assimilation for Detail-Preserving Display of High-Dynamic Range Images.
The Visual Computer 19:7/8 (2003), 457466.
[239] Akiko Yoshida, Volker Blanz, Karol Myszkowski, and Hans-Peter Seidel. Perceptual Evaluation of Tone Mapping Operators with Real-World
Scenes. In Human Vision and Electronic Imaging X, IS&T/SPIEs 17th
Annual Symposium on Electronic Imaging (2005), edited by Bernice E. Rogowitz, Thrasyvoulos N. Pappas, and Scott J. Daly, pp. 192203. San Jose,
CA, USA: SPIE, 2005.
[240] Akiko Yoshida, Rafal Mantiuk, Karol Myszkowski, and Hans-Peter Seidel.
Analysis of Reproducing Real-World Appearance on Displays of Varying
Dynamic Range. Computer Graphics Forum 25:3 (2006), 415426.
[241] Akiko Yoshida, Volker Blanz, Karol Myszkowski, and Hans-Peter Seidel.
Testing Tone Mapping Operators with Human-Perceived Reality. Journal
of Electronic Imaging 16:1 (2007), 013004101300414.
[242] Kun Zhou, Yaohua Hu, Stephen Lin, Baining Guo, and Heung-Yeung
Shum. Precomputed Shadow Fields for Dynamic Scenes. ACM Trans.
Graph. 24:3 (2005), 11961201.

Advanced High Dynamic Range Imaging


Francesco Banterle Alessandro Artusi
Kurt Debattista Alan Chalmers
Foreword by Holly Rushmeier

High dynamic range (HDR) imaging is the term given to the capture, storage, manipulation, transmission, and display of images that more accurately represent the wide
range of real-world lighting levels. With the advent of a true HDR video system and its
20 year history of creating static images, HDR is finally ready to enter the mainstream
of imaging technology. This book provides a comprehensive practical guide to facilitate
the widespread adoption of HDR technology. By examining the key problems associated with HDR imaging and providing detailed methods to overcome these problems,
the authors hope readers will be inspired to adopt HDR as their preferred approach for
imaging the real world. Key HDR algorithms are provided as MATLAB code as part of
the HDR Toolbox.

This book provides a practical introduction to the emerging new discipline of high
dynamic range imaging that combines photography and computer graphics. . . By
providing detailed equations and code, the book gives the reader the tools needed
to experiment with new techniques for creating compelling images.
From the Foreword by Holly Rushmeier, Yale University

Download MATLAB
source code for the book at
www.advancedhdrbook.com

Foreword by

Holly Rushmeier

Francesco Banterle
Alessandro Artusi
Kurt Debattista
Alan Chalmers

Advanced
High Dynamic Range
Imaging

You might also like