Professional Documents
Culture Documents
HDR Image Reconstruction Using Segmented Image Learning
HDR Image Reconstruction Using Segmented Image Learning
ABSTRACT Converting a low dynamic range (LDR) image into a high dynamic range (HDR) image
produces an image that closely depicts the real world without requiring expensive devices. Recent deep
learning developments can produce highly realistic and sophisticated HDR images. This paper proposes a
deep learning method to segment the bright and dark regions from an input LDR image and reconstruct the
corresponding HDR image with similar dynamic ranges in the real world. The proposed multi-stage deep
learning network brightens bright regions and darkens dark regions, and features with extended brightness
range are combined to form the HDR image. Dividing the LDR image into the bright and dark regions
effectively implements information on lost over-exposed and under-exposed areas, reconstructing a natural
HDR image with color and appearance that is similar to reality. Experimental results confirm that the
proposed method achieves an 8.52% higher HDR visual difference predictor (HDR-VDP) and a 41.2%
higher log exposure range than current methods. Qualitative evaluation also verifies that the proposed method
generates images that are close in quality to the ground truth.
INDEX TERMS High dynamic range imaging, inverse tone mapping, deep learning, single exposure image.
I. INTRODUCTION
Substantial image processing advances over the past few
years have increased the need for technology to produce
images similar to real world resolution. High dynamic
range (HDR) is the most widely used image quality enhance-
ment technology.
Fig. 1 shows that HDR generates images with similar
brightness to the range of human perception compared with
low dynamic range (LDR) images with low brightness. How-
ever, HDR images should be obtained with expensive devices FIGURE 1. Luminance ranges for high dynamic range (HDR, green), low
dynamic range (LDR, blue) and human perception (red).
that can operate beyond the standard camera sensor limita-
tions to acquire a wide dynamic range [1], and hence, most
people have limited access to HDR images. multi-exposure LDR images. The images are then merged
To overcome these limitations, multiple exposure to reconstruct an HDR image. However, since multiple
fusion (MEF) methods [2], [3] have emerged, which generate images cannot be taken simultaneously, the ghost artifact that
projects the flow of time occurs when merging the bracketed
The associate editor coordinating the review of this manuscript and images [4]. The areas with missing information also occur in
approving it for publication was Jinhua Sheng . ranges outside a given exposure.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
VOLUME 9, 2021 142729
B. D. Lee, M. H. Sunwoo: HDR Image Reconstruction Using Segmented Image Learning
multiscale context aggregation network (CAN) [25]. How- However, in contrast to the bright regions generated by the
ever, this method [16] also requires a large amount of com- CNN, the dark regions were generated using virtual camera
putation since a large number of images were fed into the curves. Therefore, they could not effectively restore all satu-
network. rated regions.
To minimize the shortcomings of the CNN-based MEF Cai et al. [20] decomposed the input LDR image into a
methods, two exposure fusion (TEF) methods that use only low-frequency luminance region and a high-frequency detail
two images have been proposed. Prabhakar et al. [17] pro- region with weighted least squares (WLS) [26]. Then, the fea-
posed a network that fuses the luminance information of tures of each region were extracted through a two-stage net-
two exposure LDR images. Yin et al. [18] combined two work. Cai et al. [20] generated HDR images with increased
exposure images through a prior aware-GAN, which consists contrast, even in low-exposure images. However, they did
of a content prior guided (CPG) encoder and a detail prior not effectively increase the dynamic range since they focused
guided (DPG) decoder. on increasing the contrast rather than restoring the saturated
However, since TEF methods [17], [18] require two expo- pixels.
sure images, generating images has a substantial impact Marnerides et al. [21] utilized regional and local infor-
on the results. Moreover, similar to the CNN-based MEF mation from LDR images over an end-to-end network and
method, it is difficult to restore information that can be divided it into global, dilation, and local branches. All recon-
observed at exposure values other than preset exposure structions were performed within the network without human
values. intervention. Although the proposed approach could obtain
natural HDR images, there were limitations to infer the
B. INVERSE TONE MAPPING METHODS clipped pixel values of the under, over-exposed regions to had
Akyüz et al. [5] implemented HDR images using linear a dynamic range similar to the real world.
expansion based on gamma correction. However, efficient
range expansion is difficult to achieve unless the input param- Algorithm 1 Finding Threshold
eters are properly set. Banterle et al. generated HDR images 1. Input: H (Histogram), O (Otsu Threshold)
using an expanded map and the median cut algorithm [6] 2. Output: thr (Threshold)
along with a power function and an expansion map [7]. 3. A = [] # Valley point set
However, these approaches require parameters for density 4. T = sum(H)
estimation, and the HDR image inferences have limitations 5. for j in range(O, 252)
for restoring over-exposed regions. 6. k1 = H(j) – H(j – 1) < 0 and H(j + 1) – H(j) > 0
Huo et al. [8] generated HDR images with fewer parame- 7. k2 = H(j) – H(j – 2) < 0 and H(j + 2) – H(j) > 0
ters using a physiological ITM algorithm based on the human 8. k3 = H(j) – H(j – 3) < 0 and H(j + 3) – H(j) > 0
visual system (HVS), and Masia et al. [9] proposed an auto- 9. if k1 and k2 and k3: # Find valley point
matic global ITM based on gamma expansion. Neverthe- 10. V1= var(H(0: j))
less, the functions employed were insufficient to reconstruct 11. V2 = var(H(j+1: 255)
saturated pixels. 12. W1 = sum(H(0: j)) / T
In contrast, some proposed methods process a single image 13. W2 = sum(H(j+1:255)) / T
by segmenting them into specific ranges. Didyk et al. [10] 14. s = V1×W1 + V2×W2
classified images into diffuse, reflection, and light regions 15. A.append((s, j))
and subsequently applied different enhancements for each 16. end if
area to generate HDR images with a high luminance range. 17. end for
Rempel et al. [11] used Gaussian filtering and an image 18. if A is not empty:
pyramid to extract pixels with high luminance and assigned 19. thr = max(A)[1]
information to saturated pixels. These approaches could 20. else:
effectively restore the saturated pixels but were unable to 21. thr = O
obtain functions satisfying the ground truth level color and 22. end if
texture.
The traditional ITM methods generate HDR images by Liu et al. [22] proposed a method for generating HDR
applying a function to LDR images. However, finding a images using multiple networks based on a conventional
function similar to the inverse CRF is very difficult. Several camera pipeline structure that generated LDR images. They
deep learning methods have recently emerged to solve this implemented dequantization, linearization, hallucination, and
tricky problem. refinement networks to reverse the process of acquiring LDR
Eilertsen et al. [19] generated HDR images using an images using deep learning. They obtained the final output by
autoencoder network with a loss function that considered sequentially passing the LDR image to these networks. How-
illuminance and reflectance. They generated HDR images ever, there is a significant risk that inference will significantly
with an increased dynamic range by blending images to change if even one network is not properly learned, since this
convert network outputs and LDR images to linear ranges. method utilizes various networks.
Ye et al. [23] showed that the bright and dark region charac- help the calculations in the Brightening and Darkening blocks
teristics differed, suggesting that different restoration meth- and feature blending.
ods should be employed for the different regions. Although
they achieved effective restoration for both under- and over- A. PRE-PROCESSING: LDR IMAGE SEGMENTATION
exposed regions, it was difficult to increase the dynamic Bright regions have information for only the bright pixels,
range efficiently due to the lack of direct brightness region and dark regions have information for only the dark pixels.
enhancement. As a result, the proposed network can effectively focus on
Most previous CNN-based ITM methods reconstruct HDR over- and under-exposed pixels when bright regions and dark
images by inputting LDR images to the network without sig- regions are fed into the network, respectively. Therefore,
nificant pre-processing. Cai et al. [20] segmented the image the input image is segmented into bright and dark regions
through pre-processing, but based on only the frequency during pre-processing to enhance the dynamic range.
component. Eilertsen et al. [19] and Ye et al. [23] divided Otsu [27] proposed the representative image segmentation
the image based on the brightness value, but [19] divided the method that separates the image into two classes using the
image after learning, and [23] segmented it just before the threshold value that provides a maximum pixel variance over
final HDR image was generated. Therefore, it is difficult to the image. In most cases, the Otsu [27] method classifies the
claim that [19] and [23] can effectively restore the saturated image very clearly into two classes ImgBright and ImgDark .
pixels since CNN weights only consider creating an image However, when the pixel values of the image are similar,
similar to the ground truth image. the class boundaries are not clear because many pixel values
Efficient reconstruction of saturated pixels becomes dif- are close to the threshold.
ficult using raw LDR image inputs. Therefore, this paper To clarify the difference between the two classes (ImgBright
proposes separating LDR images into bright and dark and ImgDark ), it is necessary to find the threshold that max-
regions before input and then learning them in different imizes the sum of the variances between the two classes,
ways. and the threshold should be located at the valley point of
the image histogram [28]. Therefore, this paper proposes a
III. PROPOSED METHOD method to find the optimal valley point in the histogram.
Fig. 3 shows the overall flow of the proposed method. The Before applying the proposed method, the brightness value V
image threshold is calculated during pre-processing, and the is measured by converting the RGB image to the HSV color
Mask is generated. The LDR image is then segmented into space, and a histogram is generated.
bright and dark regions (ImgBright and ImgDark , respectively) Alg. 1 describes the algorithm of the proposed method,
using the Mask, and ImgBright and ImgDark become the inputs and Fig. 4 shows an example of finding the optimal val-
of the proposed multi-stage network. ley point. As described in Alg. 1, the first step is to
ImgBright is learned to produce the increased brightness increase the pixel value from the Otsu threshold value to
value in the Brightening block, and ImgDark is learned by 252 (Fig. 4 (a)). The second step is to find the valley
the Darkening block to reduce the lower brightness value. points by checking the slope between the histogram value
Finally, the two feature maps with the enhanced dynamic at the current pixel and the histogram values at the points
ranges are combined to obtain the final HDR image. The k ∈ {±1, ±2, ±3}, which represent the distance from the
original image Imgoriginal is also input into the network to current pixel (Alg. 1 lines 6 – 8). Third, the two classes
FIGURE 5. (a) Input LDR image, (b) Image histogram, (c) Bright and dark
FIGURE 4. The flow of the proposed segmentation algorithm. (a) Find the regions segmented by the proposed method, (d) Bright and dark regions
Otsu threshold value, (b), (c) find the valley points, and (d) find the segmented by Otsu [27]. The threshold is 150, which is the same as the
optimal valley point. Otsu threshold.
FIGURE 7. Proposed multi-stage network architecture. ImgBright , ImgDark and ImgOriginal are the network inputs. The network was
divided into three main stages: 1) Dynamic range expansion stage 2) Global feature concatenation stage 3) Blending and generation
stage. Each stage delineated by dotted lines.
and
,
ZD = PAvg Pmin FOriginal (5)
FIGURE 8. Proposed structures for the (a) Brightening block, (b) Squeeze
parameter extraction, and (c) Darkening block. where FOriginal is a feature map created after the origi-
nal image passed through the convolution layer. We use
1) The dynamic range extension stage increases the the ReLU [30] activation function; hence, P max , P min , and
bright range brightness and decreases the dark range P Avg denote the maximum, minimum, and average pooling,
brightness; respectively.
2) The global feature concatenation stage prevents out- The squeeze parameter ZB for the Brightening block is
liers; and obtained by maximum pooling with 4 × 4 pixels in FOriginal
and the average pooling with 1 × 1 pixels to properly repre- Consequently, the bright image is brighter and the dark
sent brightness (4). image is darker through the Brightening and Darkening
Akyuz et al. [5] showed that HVS is more sensitive to dark blocks, respectively. Naturally inferred feature maps can be
than bright regions, suggesting that dynamic ranges should obtained since the dynamic range expansion was achieved
be scaled to different sizes based on the range. In particular, within a single network.
Ye et al. [23] argued that the dark regions texture and color
information are ambiguous, in contrast to the bright regions; 2) GLOBAL FEATURE CONCATENATION
hence, it may not be able to effectively restore under-exposed Unwanted outliers can be generated from the final HDR
ranges if brightness changes significantly. Reibel et al. [31] image because the level at which brightness values change is
showed that dark regions filmed with a camera contain more determined automatically within the proposed deep learning
noise than bright regions; hence, unintended artifacts can network. Although the image is smoothly segmented during
appear if the change is severe when darkening the dark pre-processing, artifacts can appear around the segmented
regions. image boundaries when combining the two feature maps,
Thus, the degree of change in ImgDark should be less than since ImgBright and ImgDark learning were conducted using
that for ImgBright . Hence, the squeeze parameter ZD used in different parameters.
the Darkening block was obtained using the minimum rather Therefore, we used the global branch structure proposed by
than maximum pooling (5). Marnerides et al. [21] to generate a feature map with abstract
The final channel statistics C B and C D were determined features from the original image, which limits significant
after calculating ZB and ZD , deformation for the image learned through this feature map.
Fig. 7 shows the generated global image feature FGlobal with
CB = Fs (Fr (ZB )) (6)
size 64 that is subsequently used as
and FBG = FP FBright , FGlobal ,
(10)
CD = Fs (Fr (ZD )) , (7) FDG = FP ([FDark , FGlobal ]) , (11)
A sigmoid function was used in the final layer to map the We also used the HDR-Real dataset organized by
generated HDR image to the [0, 1] range. Liu et al. [22], which included 480 HDR and 4,893 LDR
images. LDR images were taken with 42 different cam-
C. LOSS FUNCTION eras and converted into HDR images using Photomatix soft-
The Weber–Fechner law [34] states that stimuli should be ware [44]. Therefore, there was no need to generate additional
transformed into a constant ratio rather than a constant quan- LDR images, unlike the other datasets. Of the 9,828 data
tity for human senses to detect intensity changes because images, we used 8,118 images for training and the remaining
luminance variation is logarithmic; hence, loss functions 1,710 for testing. The training images were resized to 256 ×
should be handled in the logarithmic domain to create the 256 pixels, and HDR images were normalized onto [0, 1] for
HDR image optimized for human senses. Therefore, we used uniform learning. For testing, we used the 512 × 512 pixel
the logarithmic loss function images.
L ĥ, h = LRGB ĥ, h + βLLuminance ĥ, h , (17) B. TRAINING
Training and testing were implemented using Python v.3.6 on
where ĥ is the HDR image estimated by the network, h is the an Intel i7-7820X CPU with an Nvidia GTX 1080 Ti GPU.
ground truth HDR image, and β is the loss function weight. We conducted 200 repeated training sessions on the 8,118
We set β = 0.3 to generate the highest experimental images, and the results were checked every 20 epochs to
result and implemented the loss function as a sum of LRGB , prevent overfitting. The initial learning rate was 0.00007,
comparing pixel values in RGB and LLuminance in luminance and we use the Adam optimizer [45]. Weights were reset
to reconstruct the HDR image as close to real life as possible, using Xavier initialization [46], mini batch size = 8, and
1014 iterations were run for each epoch.
LRGB (ĥ, h) = | log ĥ + ε − log(h + ε)| (18)
and V. RESULTS
This section compares the results from the proposed
LLuminace = | log Y ĥ + ε − log(Y (h) + ε)|. (19) and the existing methods using the peak signal noise
to rate (PSNR) [47], structural similarity index measure
LRGB (18) and LLuminace (19) were based on the L1 distance (SSIM) [48], and multiscale SSIM (MS-SSIM) [49] metrics,
and use a small constant ε to eliminate specificity at a zero- which are the most common evaluation indicators. PSNR is
pixel value. LRGB (18) computes the difference between the calculated based on the MSE, which is the Euclidean distance
ground truth and the predicted values in the RGB range, between pixels, with a higher value implying better similarity
whereas LLuminace (19) computes in the luminance range. to the original. SSIM measures the similarity to the original
According to [1], the pixel value of the HDR image is an image using pixel variance and the mean from the two images.
approximation of photometric quantities, in contrast with the MS-SSIM is calculated by adding scale-space concepts to
LDR image, and is linearly related to luminance in general. SSIM. Reconstructed images are structurally similar to the
Therefore, it is reasonable to evaluate the HDR image in the original images where SSIM and MS-SSIM are close to 1.
luminance range, extracting luminance values from the RGB PSNR, SSIM, and MS-SSIM metrics express mathematical
range, pixel differences between the original and the comparison
Y = 0.2126R + 0.7152G + 0.0722B, (20) image. However, HDR images should be measured based on
human visual perception as well as mathematical differences.
where R, G, and B indicate red, green, and blue channel pixel Therefore, in this paper, we also use the quality Q score for
values, respectively. Finally, we obtain the absolute difference the HDR visual difference predictor (HDR-VDP) [50] as an
between the estimated and ground truth after converting HDR evaluation metric, which is based on the human visual model
pixel values to the luminance range using (20). for all luminance environments and has the highest level of
objectivity to assess HDR image quality. The log exposure
IV. EXPERIMENTS range is used to evaluate the brightness range, which is the
A. DATASET most critical element for the HDR image,
Datasets are important factors in determining deep learn-
Log Exposure Range = log2 lHigh − log2 (lLow ) (21)
ing networks. Moreover, objectivity is maintained by learn-
ing in an environment with similar objects. Therefore, where lHigh and llow are the highest and smallest lumi-
we used the most common HDR image reconstruction nance, respectively. The dynamic range of the HDR image
datasets: Funt [35], [36], Stanford [37], Ward [38], [39], is evaluated by comparing the log exposure range with
and Fairchild [40]. Since these datasets include only HDR reference methods, including the traditional and previous
images rather than LDR–HDR pairs, we employed tone map- deep learning methods. The traditional ITM methods include
ping [41]–[43] to synthesize corresponding LDR images, Akyuz et al. [5], Huo et al. [8], and Masia et al. [9], referred
obtaining 4,935 total LDR images by randomly cropping to as AEO, HUO, and MEO, respectively, and the deep learn-
512 × 512 regions five times for each tone mapping. ing methods include Endo et al. [12], Eilertsen et al. [19],
TABLE 1. Comparisons of image qualification among the reference methods and the proposed method. Bold indicates the best score.
Cai et al. [20], and Marnerides et al. [21], referred to as TABLE 2. Comparisons of log exposure range among the reference
methods and the proposed method. Bold indicates the best score.
DrTMO, HDRCNN, SICE, and EXP, respectively.
A. QUANTITATIVE COMPARISON
Table 1 compares PSNR, SSIM, MS-SSIM, and HDR-VDP
for the various methods. The proposed method achieves the
highest score on all four evaluation metrics. In particular,
the structural detail is increased, returning higher SSIM and
MS-SSIM scores than the other methods due to the increased
contrast with a wider brightness range. The VDP-Q score
is also the highest for the proposed method, i.e., the best
HDR image is reconstructed with respect to human cognitive
B. QUALITATIVE COMPARISON
aspects.
In this paper, we use Reinhard [43] tone mapping with the
The deep learning methods [12], [19]–[21] use map-ranged
8-bit LDR image for the HDR image to compare with the pro-
ground truth images in the range [0, 1]. In contrast, the tra-
posed method. Qualitative image comparisons are evaluated
ditional ITM methods [5], [8], [9] multiple the LDR image
for less saturated and highly saturated LDR image inputs. The
by specific functions without a specified range to create the
dynamic brightness range is compared by increasing HDR
HDR image. Consequently, when mapping the HDR image
image exposure.
of [5], [8], [9] generated for objective comparison to a range
of [0, 1], the minimum and maximum values of pixels are 1) LESS SATURATED LDR IMAGES
set to 0 and 1. Therefore, the size of the log exposure To evaluate images in various environments, photographs
range becomes infinite, which makes it impossible to objec- taken in bright and dark places were compared. Fig. 9 com-
tively compare with the deep learning methods. Therefore, pares the LDR image for a park in the daytime with the
Fig. 13 compares the AEO, HUO, and MEO luminance reconstructed HDR image. Sun shape, nearby tree details (red
ranges with the proposed method by adjusting the exposure box), and bush color information (blue box) were damaged
values. due to the bright light sources.
Table 2 compares the log exposure range between The HDR image generated by the proposed method (j) was
the proposed method and the existing deep learning closest to the ground truth (b). In particular, the image cap-
methods [12], [19]–[21], where the entire test dataset tured the sun shape (red box) due to high brightness most
(1,710 images) includes 307 night or dark room images with naturally and had the most similar color to the ground truth
low brightness and 322 daytime or bright room images with (blue box).
high brightness. The mean log exposure range for the ground The traditional ITM methods (c) – (e) generated HDR
truth images = 16.15 Stops, whereas HDR images gener- images that were similar to the input LDR image (a), but the
ated by the proposed method achieve a mean log exposure deep learning methods (f) – (j) reconstructed HDR images
range = 15.07 Stops. The mean log exposure range of the that had better quality with SICE (h), generating the second
proposed method is the closest result to the ground truth, with best HDR image after the proposed method. However, SICE
a subsequent decreasing order of HDRCNN, SICE, EXP, and (h) did not accurately reconstruct the sun shape (red box),
DrTMO. HDRCNN (g) and EXP(i) had lower overall contrast than
The proposed method also achieves the best results the proposed method, and DrTMO (f) generated an overall
for test images with high brightness, a ground truth blurred HDR image.
error = 2.4% and a mean log exposure range = 12.13 Stops, Fig. 10 compares HDR images inferred from the LDR
which means that the saturated pixels are effectively restored, image of the park at night. Information regarding streetlights
even in images with many over-exposed regions. In addition, (red box) and grass (blue box) reflected on the streetlights was
for test images with low brightness, the proposed method lost due to over-exposed ranges having a higher brightness
shows a mean log exposure range = 15.28 Stops, which than under-exposed ranges. Similar to the case for daytime
is 38.49% higher on average than those of the other deep park images, the traditional ITM methods (c) – (e) generated
learning methods [12], [19]–[21]. HDR images that did not adequately restore the saturated
FIGURE 9. Comparisons of the degree of inference for over-exposed regions on an example bright image: (a) Input LDR image,
(b) Ground truth; Traditional ITM methods: (c) AEO [5], (d) HUO [8], and (e) MEO [9]; Deep learning methods: (f) DrTMO [12],
(g) HDRCNN [19], (h) SICE [20], and (i) EXP [21]; (j) Proposed method.
pixel values for lights (red box) and grass (blue box) and exposed images, effectively realizing the damaged region in
differed little from the input LDR image (a). DrTMO (f) gen- the bright range, and hence, restoring more shapes than the
erated the unnatural HDR image compared to the ground other comparison methods. However, DrTMO (f) was quite
truth (b) in the process of merging the bracketed images. blurred, with the overall color tone and contrast differing
HDRCNN (g), SICE (h) and EXP (i) detected saturated pixels greatly from the ground truth (b). EXP (i) could not restore
for streetlights (red box) and grass (blue box) better than the as many saturated pixels (red box) as DrTMO (f), but overall
other methods, but they exhibited significant sharpness loss reconstructed the good HDR image, since EXP (i) efficiently
compared to the ground truth (b). used local and global information from the input LDR image.
In contrast, the proposed method (j) restored the saturated SICE (h) extracted the feature of the low- and high-frequency
street light information (red box) closest to the ground truth information of the image and generated the HDR image with
(b) and was also able to realize similar grass color and tone color and contrast close to the ground truth (b), but it could
(blue box). Thus, the proposed method could restore the not restore the detailed information of over-exposed pixels.
clearest HDR image. Fig. 12 shows the example regions where under-exposed
information was damaged, except for very bright pixels, since
2) HIGHLY SATURATED LDR IMAGES the exposure time was very short. Most LDR image informa-
Fig. 11 compares the outcomes for the LDR image that was tion was damaged, aside from the tree bulbs, making it dif-
severely clipped and therefore lost most of its information ficult to visibly identify image shapes (red box). Generating
before being restored. The original LDR image (a) was taken the HDR image from such a damaged LDR image is a painful
inside a dark building, with considerable information loss task.
outside the relatively bright building (red box) compared to Not all comparison methods could generate images close
the interior. to the ground truth (b). HUO (d) could not restore most
As shown in Fig. 11, the traditional ITM methods information, and although MEO (e), DrTMO (f), SICE (h),
(c) – (e) that did not use deep learning did not properly restore and EXP (i) restored the statue (red box) to some extent,
the information outside of the building (red box). On the other the restored images contain substantial noise and are severely
hand, the methods using deep learning effectively inferred blurred. AEO (c) and HDRCNN (g) were clearer than those
the saturated pixels based on the limited information, and in by MEO (e), DrTMO (f), SICE (h), and EXP (i), but the colors
particular, the proposed method (j) restored the most out-of- deviated considerably from the ground truth (b). In contrast,
building information (red box) corrupted by over-exposed. the image reconstructed by the proposed method (j) achieved
DrTMO (f) is the HDR image synthesized stacks of various higher clarity and a similar color tone to the ground truth.
FIGURE 10. Comparisons of the degree of inference for over-exposed regions on example dark image: (a) Input LDR
image, (b) Ground truth; Traditional ITM methods: (c) AEO [5], (d) HUO [8], and (e) MEO [9]; Deep learning methods:
(f) DrTMO [12], (g) HDRCNN [19], (h) SICE [20], and (i) EXP [21]; (j) Proposed method.
FIGURE 11. Comparisons of over-exposed area restoration capability: (a) Input LDR image, (b) Ground truth; Traditional ITM methods:
(c) AEO [5], (d) HUO [8], and (e) MEO [9]; Deep learning methods: (f) DrTMO [12], (g) HDRCNN [19], (h) SICE [20], and (i) EXP [21];
(j) Proposed method.
FIGURE 12. Comparisons of under-exposed area restoration capability: (a) Input LDR image, (b) Ground truth; Traditional ITM
methods: (c) AEO [5], (d) HUO [8], and (e) MEO [9]; Deep learning methods: (f) DrTMO [12], (g) HDRCNN [19], (h) SICE [20], and
(i) EXP [21]; (j) Proposed method.
3) COMPARISON FOR DYNAMIC RANGE rather, we multiplied the HDR image by 255 and converted
The proposed method generates images with a wider dynamic the values into integers.
range than the existing approaches (see Table 2). For quali- Fig. 13 shows that the proposed method gives the most
tative analysis of the dynamic range, we observed the HDR information for all of the exposure values. All considered
image up to 4 exposure value (EV) units. We did not use methods showed similar information when the exposure was
the Reinhard [43] tone mapping method to compare methods, increased by 4 EVs. However, DrTMO (d) lost the most pixel
TABLE 4. Comparisons of image qualification among the proposed regenerate high-quality HDR images [12]–[23]. However,
network without the global feature and the proposed network with the
global feature. Segmented images are fed into each network. Bold the LDR image should be separated and learned according
indicates the best score. to brightness values to accurately reflect reality.
Reconstructed HDR images using the proposed method
achieved average PSNR, SSIM, MS-SSIM, and HDR-
VDP2 metrics of 30.4%, 42.8%, 42.9%, and 8.52% higher
than the previous methods, respectively. In particular,
the proposed method provided a high mean log exposure
range = 41.2%, confirming that the dynamic range was the
most similar to the real world. Furthermore, qualitative evalu-
ations enabled the generation of an HDR image similar to the
original image with the highest resilience, even though the
input LDR image had lost considerable pixel information.
Generating the image close to the real world using limited
information from the clipped LDR image remains difficult.
The proposed method could not reconstruct irradiance per-
fectly, and LDR images with severely damaged exposure
values remain unable to generate HDR images close to the
ground truth. We think this issue can be resolved by clas-
sifying images into more classes for training. In addition to
the bright and dark ranges, the LDR image could be further
FIGURE 15. Comparisons of HDR images among the proposed network classified by brightness, helping the clipped ranges to become
without the global feature and the proposed network with the global effectively generated.
feature.
REFERENCES
In both (a) and (b), when the segmented image is used as
[1] R. K. Mantiuk, K. Myszkowski, and H. P. Seidel, ‘‘High dynamic range
input, the color and shape information of the saturated pixels imaging,’’ in Wiley Encyclopedia of Electrical and Electronics Engineer-
are restored in more detail, and the contrast of the image ing. Hoboken, NJ, USA: Wiley, 2015, pp. 1–42.
increases as the dynamic range improves. [2] P. E. Debevec and J. Malik, ‘‘Recovering high dynamic range radiance
maps from photographs,’’ in Proc. ACM SIGGRAPH Classes, Aug. 2008,
pp. 1–10.
D. COMPARISON WITH AND WITHOUT GLOBAL FEATURE [3] T. Mertens, J. Kautz, and F. Van Reeth, ‘‘Exposure fusion,’’ in Proc. 15th
CONCATENATION Pacific Conf. Comput. Graph. Appl. (PG), Maui, HI, USA, Oct./Nov. 2007,
pp. 382–390.
To prevent artifacts from appearing, the global feature FGlobal [4] A. Srikantha and D. Sidibé, ‘‘Ghost detection and removal for high
used in [21] is concatenated with FBright , FDark , and FBG C dynamic range images: Recent advances,’’ Signal Process., Image Com-
mun., vol. 27, no. 6, pp. 650–662, 2012.
FDG (10), (11), (12). We compared the results of cases with
[5] A. O. Akyüz, R. Fleming, B. E. Riecke, E. Reinhard, and H. H. Bülthoff,
and without FGlobal in the proposed network that uses the ‘‘Do HDR displays support LDR content? A psychophysical evaluation,’’
segmented images as input to verify whether FGlobal reduces ACM Trans. Graph., vol. 26, no. 3, pp. 1–7, Jul. 2007.
the number of artifacts. Table 4 compares the quantitative [6] F. Banterle, P. Ledda, K. Debattista, and A. Chalmers, ‘‘Inverse tone map-
ping,’’ in Proc. 4th Int. Conf. Comput. Graph. Interact. Techn. Australasia
evaluation with and without FGlobal in the proposed network. Southeast Asia, New York, NY, USA, 2006, pp. 349–356.
As shown in Table 4, when FGlobal was used in the pro- [7] F. Banterle, K. Debattista, A. Artusi, S. Pattanaik, K. Myszkowski,
posed network, the PSNR was 4.1% higher than that without P. Ledda, and A. Chalmers, ‘‘High dynamic range imaging and low
dynamic range expansion for generating HDR content,’’ Comput. Graph.
FGlobal . In other words, using FGlobal could improve the Forum, vol. 28, no. 8, pp. 2343–2367, 2009.
overall HDR image quality since the number of artifacts could [8] Y. Huo, F. Yang, L. Dong, and V. Brost, ‘‘Physiological inverse tone
be reduced on the segmentation boundary. mapping based on retina response,’’ Vis. Comput., vol. 30, no. 5,
pp. 507–517, 2014.
Fig. 15 shows the HDR images reconstructed by the pro-
[9] B. Masia, A. Serrano, and D. Gutierrez, ‘‘Dynamic range expansion based
posed network with FGlobal and the proposed network with- on image statistics,’’ Multimedia Tools Appl., vol. 76, no. 1, pp. 631–648,
out FGlobal . As shown in Fig. 15, when FGlobal was not used, 2017.
artifacts appeared at the division boundary near the reflected [10] P. Didyk, R. Mantiuk, M. Hein, and H. P. Seidel, ‘‘Enhancement of bright
video features for HDR displays,’’ Comput. Graph. Forum, vol. 27, no. 4,
light area of the ceiling. However, when FGlobal was used, pp. 1265–1274, 2008.
the boundary of the reflected light area of the ceiling showed [11] A. G. Rempel, M. Trentacoste, H. Seetzen, H. D. Young, W. Heidrich,
a natural shape without any artifacts. L. Whitehead, and G. Ward, ‘‘Ldr2Hdr: On-the-fly reverse tone mapping of
legacy video and photographs,’’ ACM Trans. Graph., vol. 26, no. 3, p. 39,
2007.
VI. CONCLUSION [12] Y. Endo, Y. Kanamori, and J. Mitani, ‘‘Deep reverse tone mapping,’’ ACM
Recent developments in deep learning technology have sub- Trans. Graph., vol. 36, no. 6, p. 177, 2017.
[13] S. Lee, G. H. An, and S.-J. Kang, ‘‘Deep chain HDRI: Reconstructing a
stantially advanced ITM approaches, with many studies high dynamic range image from a single low dynamic range image,’’ IEEE
employing deep learning to reconstruct saturated pixels and Access, vol. 6, pp. 49913–49924, 2018.
[14] S. Lee, G. H. An, and S.-J. Kang, ‘‘Deep recursive HDRI: Inverse tone [40] M. D. Fairchild, ‘‘The HDR photographic survey,’’ in Proc. Color Imag.
mapping using generative adversarial networks,’’ in Proc. Eur. Conf. Com- Conf., 2007, pp. 233–238.
put. Vis. (ECCV), Sep. 2018, pp. 596–611. [41] F. Drago, K. Myszkowski, T. Annen, and N. Chiba, ‘‘Adaptive logarithmic
[15] S. Lee, S. Y. Jo, G. H. An, and S.-J. Kang, ‘‘Learning to generate multi- mapping for displaying high contrast scenes,’’ Comput. Graph. Forum,
exposure stacks with cycle consistency for high dynamic range imaging,’’ vol. 22, no. 3, pp. 419–426, 2003.
IEEE Trans. Multimedia, vol. 23, pp. 2561–2574, 2021. [42] R. Mantiuk, S. Daly, and L. Kerofsky, ‘‘Display adaptive tone mapping,’’
[16] K. Ma, Z. Duanmu, H. Zhu, Y. Fang, and Z. Wang, ‘‘Deep guided learn- ACM Trans. Graph., vol. 27, no. 3, pp. 1–28, 2008.
ing for fast multi-exposure image fusion,’’ IEEE Trans. Image Process., [43] E. Reinhard, M. Stark, P. Shirley, and J. Ferwerda, ‘‘Photographic tone
vol. 29, pp. 2808–2819, 2020. reproduction for digital images,’’ ACM Trans. Graph., vol. 21, no. 3,
[17] K. R. Prabhakar, V. S. Srikar, and R. V. Babu, ‘‘DeepFuse: A deep unsuper- pp. 267–276, Jul. 2002.
vised approach for exposure fusion with extreme exposure image pairs,’’ [44] HDR SOFT: Photomatix. [Online]. Available: https://www.hdrsoft.com
in Proc. ICCV, Oct. 2017, pp. 4724–4732. [45] D. P. Kingma and J. Ba, ‘‘Adam: A method for stochastic
[18] J.-L. Yin, B.-H. Chen, and Y.-T. Peng, ‘‘Two exposure fusion using prior- optimization,’’ 2014, arXiv:1412.6980. [Online]. Available:
aware generative adversarial network,’’ IEEE Trans. Multimedia, early http://arxiv.org/abs/1412.6980
access, Jun. 15, 2021, doi: 10.1109/TMM.2021.3089324. [46] X. Glorot and Y. Bengio, ‘‘Understanding the difficulty of training deep
[19] G. Eilertsen, J. Kronander, G. Denes, R. K. Mantiuk, and J. Unger, ‘‘HDR feedforward neural networks,’’ in Proc. 13th Int. Conf. Artif. Intell. Statist.,
image reconstruction from a single exposure using deep CNNS,’’ ACM 2010, pp. 249–256.
Trans. Graph., vol. 36, no. 6, p. 178, 2017. [47] P. C. Teo and D. J. Heeger, ‘‘Perceptual image distortion,’’ in Proc. 1st Int.
[20] J. Cai, S. Gu, and L. Zhang, ‘‘Learning a deep single image contrast Conf. Image Process., Nov. 1994, pp. 982–986.
enhancer from multi-exposure images,’’ IEEE Trans. Image Process., [48] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, ‘‘Image quality
vol. 27, no. 4, pp. 2049–2062, Apr. 2018. assessment: From error visibility to structural similarity,’’ IEEE Trans.
[21] D. Marnerides, T. Bashford-Rogers, J. Hatchett, and K. Debattista, Image Process., vol. 13, no. 4, pp. 600–612, Apr. 2004.
‘‘ExpandNet: A deep convolutional neural network for high dynamic range [49] Z. Wang, E. P. Simoncelli, and A. C. Bovik, ‘‘Multiscale structural simi-
expansion from low dynamic range content,’’ Comput. Graph. Forum, larity for image quality assessment,’’ in Proc. 37th Asilomar Conf. Signals,
vol. 37, no. 2, pp. 37–49, May 2018. Syst. Comput., vol. 2. Nov. 2003, pp. 1398–1402.
[50] M. Narwaria, R. K. Mantiuk, M. P. Da Silva, and P. Le Callet,
[22] Y.-L. Liu, W.-S. Lai, Y.-S. Chen, Y.-L. Kao, M.-H. Yang, Y.-Y. Chuang, and
‘‘HDR-VDP-2.2: A calibrated method for objective quality prediction of
J.-B. Huang, ‘‘Single-image HDR reconstruction by learning to reverse the
high-dynamic range and standard images,’’ J. Electron. Imag., vol. 24,
camera pipeline,’’ in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit.
no. 1, 2015, Art. no. 010501.
(CVPR), Jun. 2020, pp. 1651–1660.
[23] N. Ye, Y. Huo, S. Liu, and H. Li, ‘‘Single exposure high dynamic range
image reconstruction based on deep dual-branch network,’’ IEEE Access,
vol. 9, pp. 9610–9624, 2021. BYEONG DAE LEE (Graduate Student Mem-
[24] O. Ronneberger, P. Fischer, and T. Brox, ‘‘U-net: Convolutional networks ber, IEEE) received the B.S. degree in electronics
for biomedical image segmentation,’’ 2015, arXiv:1505.04597. [Online]. engineering from Ajou University, Suwon, South
Available: http://arxiv.org/abs/1505.04597 Korea, in 2020, where he is currently pursuing the
[25] F. Yu and V. Koltun, ‘‘Multi-scale context aggregation by dilated con- M.S. degree in electronic engineering. His current
volutions,’’ in Proc. 4th Int. Conf. Learn. Represent. (ICLR), Nov. 2016, research interests include image processing, com-
pp. 1–13. puter vision, deep learning, and SoC design for
[26] Z. Farbman, R. Fattal, D. Lischinski, and R. Szeliski, ‘‘Edge-preserving deep learning.
decompositions for multi-scale tone and detail manipulation,’’ ACM Trans.
Graph., vol. 27, no. 3, pp. 1–10, Aug. 2008,
[27] N. Otsu, ‘‘A threshold selection method from gray-level histograms,’’ IEEE
Trans. Syst., Man, Cybern., vol. SMC-9, no. 1, pp. 62–66, Jan. 1979.
[28] J.-L. Fan and B. Lei, ‘‘A modified valley-emphasis method for automatic MYUNG HOON SUNWOO (Fellow, IEEE)
thresholding,’’ Pattern Recognit. Lett., vol. 33, no. 6, pp. 703–708, 2012. received the B.S. degree from Sogang University,
[29] Y. Zhang, K. Li, K. Li, L. Wang, B. Zhong, and Y. Fu, ‘‘Image super- in 1980, the M.S. degree in electrical engineering
resolution using very deep residual channel attention networks,’’ in Proc. from KAIST, in 1982, and the Ph.D. degree in
Eur. Conf. Comput. Vis. (ECCV), Sep. 2018, pp. 286–301. electronics and communication engineering from
[30] A. L. Maas, A. Y. Hannun, and A. Y. Ng, ‘‘Rectifier nonlinearities improve The University of Texas at Austin, Austin, TX,
neural network acoustic models,’’ in Proc. ICML, Jan. 2013, pp. 1–6. USA, in 1990. He worked with ETRI, Daejeon,
[31] Y. Reibel, M. Jung, M. Bouhifd, B. Cunin, and C. Draman, ‘‘CCD or South Korea, from 1982 to 1985, and the Digi-
CMOS camera noise characterisation,’’ Eur. Phys. J. Appl. Phys., vol. 21, tal Signal Processor Operations, Motorola, Austin,
no. 1, pp. 75–80, Jan. 2003. from 1990 to 1992. Since 1992, he has been with
[32] K. He, X. Zhang, S. Ren, and J. Sun, ‘‘Delving deep into rectifiers: the School of Electrical and Computer Engineering, Ajou University, Suwon,
Surpassing human-level performance on ImageNet classification,’’ in Proc. South Korea, where he is currently a Professor. He is also the Director of
IEEE Int. Conf. Comput. Vis. (ICCV), Dec. 2015, pp. 1026–1034. the Medical Image Based Intelligent Diagnostic Solutions (MIIDS) Research
[33] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, ‘‘Densely Center. He has authored over 450 articles and holds more than 100 patents.
connected convolutional networks,’’ in Proc. IEEE Conf. Comput. Vis. He has won more than 50 awards. His current research interests include
Pattern Recognit. (CVPR), Jul. 2017, pp. 4700–4708.
artificial intelligence circuits and systems, low-power algorithms and archi-
[34] G. Fechner, Elements of Psychophysics, vol. 1. Washington, DC, USA:
tectures, medical imaging diagnosis, and deep learning. He has served as
American Psychological Assoc., 1966.
the General Chair for the International Symposium on Circuits and Systems
[35] B. Funt and L. Shi, ‘‘The rehabilitation of MaxRGB,’’ in Proc. Color Imag.
(ISCAS) 2012, Seoul, South Korea, and the General Co-Chair for ISCAS
Conf., Nov. 2010, pp. 256–259.
[36] B. Funt and L. Shi, ‘‘The effect of exposure on MaxRGB color constancy,’’
2021, Daegu, South Korea. As the IEEE CASS VP-Conference, he launched
Proc. SPIE, vol. 7527, Feb. 2010, Art. no. 75270Y. the IEEE International Conference on Artificial Intelligence Circuits and
[37] F. Xiao, J. M. DiCarlo, P. B. Catrysse, and B. A. Wandell, ‘‘High dynamic Systems (AICAS), in 2019. He was a Distinguished Lecturer with IEEE
range imaging of natural scenes,’’ in Proc. Color Imag. Conf., 2002, CASS, from 2009 to 2010. He had been serving on the IEEE CASS
pp. 337–342. Board of Directors (BoG), from 2011 to 2016. He is also the IEEE CASS
[38] E. Reinhard, W. Heidrich, P. Debevec, S. Pattanaik, G. Ward, and VP-Conferences. He was an Associate Editor of IEEE TRANSACTIONS ON
K. Myszkowski, High Dynamic Range Imaging: Acquisition, Display, and VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, from 2002 to 2003, and the
Image-Based Lighting. Burlington, MA, USA: Morgan Kaufmann, 2010. Co-Editor of books, including ‘‘Selected Topics in Biomedical Circuits and
[39] G. Ward. (2006). High Dynamic Range Image Encodings. [Online]. Avail- Systems’’ (River Publishers Series in Circuits and Systems, 2021).
able: https://www.anyhere.com/gward/hdrenc/hdr_encodings.html