Eednet: Enhanced Encoder-Decoder Network For Autoisp: (Xiangyu - He, Jcheng) @NLPR - Ia.Ac - CN

EEDNet: Enhanced Encoder-Decoder
Network for AutoISP
Yu Zhu1 , Zhenyu Guo2,3 , Tian Liang2,3 , Xiangyu He2 , Chenghua Li2,4(B) ,

Cong Leng2,4 , Bo Jiang1 , Yifan Zhang2,4 , and Jian Cheng2,4(B)
1
School of Computer Science and Technology, Anhui University, Hefei 230601, China
zhuyu.cv@gmail.com
2
Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
{xiangyu.he,jcheng}@nlpr.ia.ac.cn,
lichenghua2014@ia.ac.cn
3
School of Artificial Intelligence, University of Chinese Academy of Sciences,
Beijing 100049, China
4
Nanjing Artificial Intelligence Chip Research, Institute of Automation,
Chinese Academy of Sciences (AiRiA), Nanjing, China
Abstract. Image Signal Processor (ISP) plays a core rule in camera sys-
tems. However, ISP tuning is highly complicated and requires professional
skills and advanced imaging experiences. To skip the painful ISP tuning
process, we introduce EEDNet in this paper, which directly transforms an
image in the raw space to an image in the sRGB space (RAW-to-RGB).
Data-driven RAW-to-RGB mapping is a grand new low-level vision task.
In this work, we propose a hypothesis of the receptive field that large recep-
tive field (LRF) is essential in high-level computer vision tasks, but not
crucial in low-level pixel-to-pixel tasks. Besides, we present a ClipL1 loss,
which simultaneously considers easy examples and outliers during the opti-
mization process. Benefiting from the LRF hypothesis and ClipL1 loss,
EEDNet can generate high-quality pictures with more details. Our method
achieves promising results on Zurich RAW2RGB (ZRR) dataset and won
the first place in AIM2020 ISP challenging.
Keywords: ISP · RAW-to-RGB · LRF hypothesis · ClipL1
1 Introduction
Image Signal Processor (ISP) is a specialized digital signal processor for recon-
structing RGB images from raw Bayer images. In conventional camera pipelines,
whether smartphones or DSLR cameras, complex and confidential hardware pro-
cesses are employed to perform image signal processing. Meanwhile, ISP tuning is
highly complicated where professional skills and advanced imaging experiences
are indispensable. It consists of various processing steps including denoising,
white balancing, exposure correction, demosaicing, colour transform, gamma
Y. Zhu, Z. Guo, T. Liang, X. He—Equal Contribution

c Springer Nature Switzerland AG 2020
A. Bartoli and A. Fusiello (Eds.): ECCV 2020 Workshops, LNCS 12537, pp. 171–184, 2020.
https://doi.org/10.1007/978-3-030-67070-2_10
172 Y. Zhu et al.
encoding and so on. While every step with independent task-specific loss func-
tion in conventional ISP is performed sequentially, residual error accumulates
at the same time [17]. To correct these stepwise accumulated errors, tedious
parameter tuning process should be employed at the later stages.
More concretely, many of the conventional methods use hand-crafted
heuristics-based approaches to derive the solution at each step in the image
signal processor pipeline, thus leaving oceans of parameters to be tuned in cor-
responding to complicated and volatile environments in the real world. Besides,
the sequentially performed various ISP process using modular-based algorithms
will result in cumulative errors at every step. A small change in parameter con-
figuration may lead to different reconstructed RGB images.
Meanwhile, smartphones have gradually become a part of daily life. High-
quality photos, along with the continuous improvement of mobile phone cameras,
have gone from the privilege of professional camera to something that ordinary
people can easily access. Heavy image signal processing systems are embedded
in phones, promoting the quality of photos. However, due to the limited hard-
ware resources of mobile cameras, there may always be a big gap between phone
and professional cameras. How to make the picture quality of the mobile phone
camera as close as possible to the professional one has become our concern. It’s
known that a well adjusted ISP can bring competitive quality to the images taken
by smartphones. Nevertheless, the design of ISP and the adjustment of internal
module parameters are not very simple. For camera or smartphone manufactur-
ers, ISP is regarded as a core competency. In light of this, we conduct EEDNet
to evade the painful ISP tuning process and narrow the gaps between various
smartphone cameras generated by different ISP pipelines. EEDNet uses a unified
loss function to optimize the entire processing involved in an ISP pipeline in an
end-to-end optimization setting.
Each module in traditional ISP can neither control the output of other mod-
ules nor recover the signal loss of previous modules. The idea that using a convo-
lutional neural network (CNN) to replace the hardware-based ISP is supported
by the fact that CNN can compensate for the information loss of input images,
which is more reliable than the traditional ISP, and can effectively break through
the hardware limitation. Andrey et al. [9] pioneered the application of CNN to
replace the camera ISP of smartphones and proposed the RAW-to-RGB dataset
with PyNET network.
In this paper, we show that deep neural networks with Large Receptive Fields
(LRF) are not required in this task. In contrast to the popular design in object
detection [12,18] and semantic segmentation [4], which emphasize semantic infor-
mation, we assume that low-level image processing tasks such as RAW-to-RGB
could pay more attention to local structures. To further verify our hypothesis,
we conduct extended experiments on SIDD+ [1]. The results show that U-Net
[19] without LRF can also obtain promising results.
Our main contributions can be summarized as follows:
– We prove that the RAW-to-RGB task does not require LRF in the encoder-
decoder structure. Furthermore, we verify our hypothesis on the SIDD+ task.
EEDNet: Enhanced Encoder-Decoder Network for AutoISP 173
– We propose ClipL1 loss, which eliminates the effect of easy examples and
outliers during training.
– We present EEDNet with a desirable receptive filed configuration, which out-
performs PyNET.
2 Related Work
In this section, we briefly review and discuss the work about image signal pro-
cessing in two parts, i.e., convolution neural network for low-level vision tasks
and previous works using deep learning techniques to learn the ISP pipeline.
2.1 CNN for Low-Level Vision

During the recent years, the deep learning techniques have been widely used
in low-level vision tasks, including removing moire patterns [20,25], denoising
[26,27], super-resolution [6,23], high dynamic range expansion [14,24], deblur
[15,21] and bokeh [7,16]. CNNs have been popular solutions to various single
imaging tasks. Sun et al. proposed a deep CNN to remove moire artefacts in a
photo taken of screens [20] with the non-linear multiresolution analysis of the
moire photos and created a large-scale benchmark dataset for this task. FFD-
Net [27] is an excellent work in denoising tasks that can handle a wide range of
noise levels with a tunable noise level map as the input. The super-resolution
tasks possess remembrances with image signal processing tasks. Especially the
Residual-in-Residual Dense Block, which is of higher capacity and easier to train,
introduced in ESRGAN [23] is verified effective in our enhanced U-Net for the
ISP task. ExpandNet [14] is a three branches convolution neural network, com-
bining local, medium level and global feature information, and it avoids the use
of upsampling layers to improve image quality when generating HDR content
from LDR content. Tao et al. proposed an efficient and effective network [21] for
the image deblurring task, which restores the sharp image of different resolutions
in a pyramid. [7] presents a large-scale bokeh dataset consisting of 5K shallow
wide depth-of-field image pairs and uses PyNet to deal with this task.
2.2 ISP Designing

CNNs, have not only shown significant advantages in low-level vision tasks but
also widely used in high-level tasks, such as object detection, segmentation [4,13].
Therefore, it is highly conceivable to apply CNNs to reconstruct high-quality,
full-colour images from image signals(such as RAW Bayer pattern). However,
despite these successes in various vision tasks, little work has been conducted
on the ISP pipeline learning. Deep Camera [17] analyzed the reason why tradi-
tional ISP pipeline may be tough to tune and developed a fully convolutional
network for performing ISP pipeline. Andrey et al. [9] proposed RAW-to-RGB
data set and the end-to-end PyNET to conduct the ISP task for the first time.
It undoubtedly shows the potential of CNN for image processing as a substi-
tution of hardware modules, even the most sophisticated ISP. CameraNet [11]
174 Y. Zhu et al.
Input
(Level 1)
Level 2
Level 3
Level 4
The Highest Level
Fig. 1. Five level U-Net [19] with an additional upsample layer added to the top of
UNet.
21.2
21.15
21.1
21.05
PSNR
21
20.95
20.9
20.85
0 20 40 60 80 100 120 140 160
RFs
Fig. 2. Receptive Fields of the highest level layers in different encoder decoder networks
and its corresponding fidelity.
categorized the ISP pipeline into two weakly correlated parts, restoration and
enhancement, and proposed a two-stage network to account for the two inde-
pendent operations. In this paper, we proposed a simple but effective EEDNet
to achieve better performance both in PSNR and visual effect.
3 Analysis of Receptive Field
Before introducing the network, we briefly review the definition of receptive

field. Receptive fields are defined portion of space or spatial construct containing
units that provide input to a set of units within a corresponding layer [2]. We
hypothesize that Large Receptive Field (LRF) is required by high-level image
understanding tasks, while it is not strongly related to low-level image processing
tasks. In this section, comprehensive experiments are conducted to demonstrate
that RF is one of the critical factors in the RAW-to-RGB task.
Without loss of generality, we define the encoder’s bottom-up RF calculation

formula as follow:
n−1

Fn = (Fn−1 ) + ((kn − 1) ∗ si ) (1)
i=1
Fn and Fn−1 represent the required RF of nth layer and the known RF of n-1th
layer whose initial value is 1. kn stands for the nth layer’s kernel size. sn is the
stride of layer i.
In our experiments, we take modified U-Net [19] as a baseline and adjust the
receptive field of the highest level (as shown in the bottom red box in Fig. 1) by
four factors: the number of downsampling operations, the size of the filters, the
depth of each level, and the dilation rate.
– For downsampling, we gradually remove the max pooling layer from top to
bottom. Besides, the convolutional layers after the removed pooling layer will
also be deprecated, which is for ensuring that the PSNR improvement is not
obtained by increasing the computations.
– For kernel size, we randomly select several low-level convolutional layers and
change its kernel size from 1 × 1 to 9 × 9 (with a step size of 2) without
changing the architecture.
– For the number of convolutional layers in each level, we randomly remove
some convolutional layers belonging to the highest level. At the same time,
we add corresponding layers at lower levels to guarantee approximately the
same computing cost.
– For dilated convolution, we randomly select one normal convolutional layer
in the encoder and replace it with different dilated convolutional layers.
The results are shown in Fig. 2. In conclusion, for the highest level, the RF
should not be too large, and there is a rough scale of favourable RF configuration
(preferably between 10 and 60). This phenomenon is related to those operations
that shrink the receptive field, such as adopting convolution with small ker-
nel sizes and cutting the downsampling. If not, the fidelity will be drastically
changed.
4 Proposed EEDNet
In this section, we introduce EEDNet inspired by UNet [19] and RF hypothesis.
Besides, to make EEDNet focus more on the significant changes of pixels in the
RGB domain, we propose Channel Attention Residual Dense Block (CA-RDB)
block and ClipL1 loss.
4.1 Network Design

According to the LRF hypothesis, LRF may make the network architecture sub-
optimal. To avoid this problem, we design EEDNet with only three downsam-
pling layers to obtain an appropriate receptive field. Besides, the RAW to RGB
176 Y. Zhu et al.
Fig. 3. Overall structure of EEDNet.
task generally involves both global and local image corrections. Layers belonging
to different levels should have different sensitivity to both high-level properties,
such as brightness or white balance, and low-level features, like textures and
edges. In light of this, we apply Channel Attention Residual Dense Block (CA-
RDB) block to skip connections, shown in Fig. 3. The idea that adding Channel
attention [5] after RDBs is heuristical for making the skip connection focus on
useful information.
For low-level tasks, especially pixel-to-pixel, the information of each pixel of
each sample is very important. Therefore, Batch Normalization [10] considering
the content of all pictures in a batch may result in the loss of unique details
of each sample. Similarly, for algorithms like Layer Normalization (LN) [3] that
need to consider correlations across channels, the difference between different
channels may be ignored. RAW to RGB task is similar to style transfer, which
means models should focus on the uniqueness of each sample since the generated
images depend on the corresponding input images. In this case, Instance Nor-
malization (IN) [22] becomes an ideal choice. Furthermore, for obtaining more
effective statistical information, we adopt SN, which combines the characteris-
tics of BN, LN and IN. In Sect. 5.4, we verify that SN is more effective than IN.
LeakyReLU is applied after each convolutional layer, except for the last layer.
Besides, we use the nearest neighbour interpolation for upsampling to avoid
time-consuming deconvolutions.
Fig. 4. Loss comparison between L1 loss, L2 loss and ClipL1 loss
Fig. 5. Gradient comparison between L1 loss, L2 loss and ClipL1 loss. The x axis is
the residual value between prediction and ground truth image, and the y axis is the
gradient value
4.2 ClipL1 Loss
ClipL1 Loss is inspired by [12], which is designed to address class imbalance in

object detection. As easy examples can overwhelm training and lead to degen-
erated models, we propose a ClipL1 Loss with only a small modification to the
original L1 Loss:
⎧
⎪
⎨cmin , if |x − y| < cmin
LClipL1 (x, y) = |x − y|, if cmin < |x − y| < cmax (2)
⎪
⎩
cmax , if |x − y| > cmax
where x is the reconstructed RGB image by our network, and y represents the
ground truth RGB image from canon ISP. cmin and cmax are thresholds for
clipping easy samples and outliers. Figure 4 shows the comparison between L1
Loss, L2 Loss and ClipL1 loss. As shown above, we regard every pixel in an
image as one sample, and reset it to the threshold if it is out of the range. The
gradient comparison between different losses is shown in Fig. 5.
178 Y. Zhu et al.
Table 1. The test set results of AIM 2020 Learned Smartphone ISP Challenge Track
1 - Fidelity.
Models PSNR SSIM

EEDNet(ours) 22.26 0.7913
2nd 21.92 0.7865
3rd 21.91 0.7842
4th 21.90 0.7829
5th 21.86 0.7807
6th 21.56 0.7846
7th 21.40 0.7834
8th 21.17 0.7794
9th 21.14 0.7729
10th 20.19 0.7622
11th 20.13 0.7438
5 Experiments
5.1 Dataset
We use Zurich RAW to RGB dataset in experiments, which is supplied by

AIM2020 Learned Smartphone ISP Challenge [8]. The data set consists of 48403
RAW-RGB image pairs. The input RAW images were captured by the Huawei
P20 smartphone and the Canon 5D Mark IV camera was adopted to collect
ground truth RGB images. The scenarios in the data set are mainly streets, sky,
trees and lawn. The training data set consists of image patch pairs. The test
data set in Track1 are also raw image patches, while in Track2 it is made up of
full-resolution RAW images. However, part of the ground truth RGB patches in
training data set are of low quality with various defects. For example, most of
the images with grass are fuzzy, and some images with sky are overexposed. To
build a reliable model to reconstruct RGB images from the RAW pattern, we
manually washed the dataset. After that, there are about 22437 image pairs left.
5.2 Implementation Details
Training Process. Our method is implemented in PyTorch 1.5.0 and trained on

8 NVIDIA RTX 2080 Ti GPUs(11G). The negative slope of Leaky ReLU is 0.2
in our EEDNet. The initial learning rate is 10−3 and step at 33th and 46th epoch
1
with 10 of the former value. We trained the networks using the Adam optimizer
with β1 = 0.9 and β2 = 0.999. We train the EEDNet with a mini-batch size of
48. In each training batch, we apply random geometric transformations of 90◦ ,
180◦ rotations, horizontal and vertical flipping. We train EEDNet for 50 epochs
with ClipL1 loss, and it takes about 2.5 h. cmin and cmax are respectively set
1
to 255 and 1 in our experiments. Note that since ClipL1 loss only focuses on
changes within a certain range, it will be sub-optimal when the network trained
with other losses.
Testing Process. For Track1, we trained 5 models with the same setting for
ensembling. They are, respectively, 4 levels U-Net with Leaky ReLU and SN
(called Modified U-Net) trained with mean square error (MSE), Modified U-Net
trained with L1, Modified U-Net trained with ClipL1, EEDNet trained with
MSE + 0.8∗MS-SSIM, EEDNet trained with ClipL1. Yet for Track2, We only
took the output full-resolution images of EEDNet trained with ClipL1 as the
final submitted results.
5.3 Results
The competition results of Track1 is shown as Table 1. We have got the first
place. Our best single model achieves 21.63 dB on the development set. The
submitted ensemble model reaches 22.26 dB on the test set. And part of the
full resolutoin images are presented in Fig. 6. We still achieved state-of-the-art
results. Our processed images have softer lighting, rich colours and no overex-
posure, compared to PyNET [9].
5.4 Ablation Studies

In order to study the effects of each component in the proposed EEDNet, we
gradually modify the baseline UNet [19] and compare their differences. The
overall visual comparison is illustrated in Fig. 7. Each column represents a model
with its configurations shown at the top. For more clarity, we select the full
resolution images processed by our EEDNet and put it in the 1st column. In the
2nd to 4th columns, we first compare the effects of BN, IN, and SN. It can be seen
that UNet with SN has the best effect. Besides, based on the LRF hypothesis in
Sect. 3, we remove a downsampling layer of UNet and the convolutional layers
belonging to the highest level. The 5th column shows the results of the model,
which is significantly better than the model on the left. All the above models
are supervised by the L1 loss function. In the 7th column, we use the proposed
CLipL1 to train our EEDNet and it shows the superiority over other models.
5.5 Smartphone-Image-Denoising Results

To further verifying our LRF hypothesis, we conduct extended experiments on
the SIDD+ task [1]. First of all, the U-Net [19] is modified by removing one
downsampling layer and its corresponding upsampling layer for decreasing the
RF of the highest level, as U-Net*. For a fair comparison, we trained both models
without data augmentation under the same conditions.
In this pair of experiments, U-Net and modified U-Net* are supervised with
L1 loss, and optimized by Adam algorithm. On real image denoising rawRGB
data set, we train the pair models with 4000 epochs and batch size of 2, where
180 Y. Zhu et al.
Huawei_RAW Huawei_ISP Canon_ISP/GT PyNET EEDNet(ours)
Fig. 6. Comparison of different ISPs. The Canon output photo in the middle is the
ground truth of the RAW-to-RGB task. Especially, for the first row, the photos pro-
cessed by our AutoISP aremore colorful compared with PyNET [9] and closer to the
Cannon camera’s output. (Color figure online)
the learning rate is 10−4 and step to 10−5 at epoch 1500. On sRGB data set,
we train the model for 1200 epochs and batch size 48, where the learning rate is
3 × 10−4 for the first 10 epochs, and then 3× 10−5 until training process finished.
The experiments are performed on RTX 2080 Ti.
The results are shown in Table 2. For the left rawRGB task, it is obvious
that U-Net* performs better than U-Net in terms of PSNR. Since the SSIM
is already high, there is not much improvement. However, for the right sRGB
task, the PSNR and SSIM of U-Net* are all conspicuously higher than U-Net.
Therefore, our LRF hypothesis can be applied to the SIDD+ task no matter
what the colour space.
1st 2 nd 3rd 4th 5th 6th 7 th

BN √ - - - - -
IN - √ - - - -
SN - - √ √ √ √
4-levles - - - √ √ √
CA_RDB - - - - √ √
ClipL1
- - - - - √
Fig. 7. Overall visual comparisons for showing the effects of each component in EED-
Net. Each column represents a model with its configurations in the top.
182 Y. Zhu et al.
Table 2. Comparison of U-net with different receptive field. Fh represents the RF of

the highest level in U-Net or U-Net*. The left of the vertical line presents the result
on the rawRGB task. The right is on sRGB task.
Models Fh PSNR SSIM PSNR SSIM

U-Net 140 52.29 0.995871 35.93 0.9052
U-Net* 108 52.33 0.995878 36.15 0.9086
6 Conclusion
In this article, we propose and verify that LRF is not crucially required in image
processing tasks, especially the RAW-to-RGB task and the SIDD task. This
preknowledge will undoubtedly benefit other basic image processing research.
Then, ClipL1 Loss is proposed to enhance the sensitivity of EEDNet to RGB
colour space. Finally, our EEDNet further narrows the gap between CNN and
DSLR’s ISP, making CNN more likely to replace ISP on smartphones. Although
we can get considerable results on the RAW-to-RGB data set, it should be
noticed that the data set is relatively small, and the scenes are not productive,
making the model suboptimal in real scenes, such as at night. In future work,
we will expand the RAW-to-RGB data set, enriching its scenes and discover
an effective solution to solve white balance. At the same time, we will further
optimize the EEDNet and diminish its computations to be applied to mobile
phones.
Acknowledgements. This work was supported by the Advance Research Program

(31511130301); National Key Research and Development Program (2017YFF0209806),
and National Natural Science Foundation of China (No. 61906193; No. 61906195; No.
61702510).
References
1. Abdelhamed, A., et al.: Ntire 2020 challenge on real image denoising: Dataset,
methods and results. arXiv preprint arXiv:2005.04117 (2020)
2. Araujo, A., Norris, W., Sim, J.: Computing receptive fields of convolutional neu-
ral networks. Distill (2019). https://doi.org/10.23915/distill.00021, https://distill.
pub/2019/computing-receptive-fields
3. Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer normalization. arXiv preprint
arXiv:1607.06450 (2016)
4. Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Deeplab:
Semantic image segmentation with deep convolutional nets, atrous convolution,
and fully connected crfs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848
(2017)
5. Chen, L., Zhang, H., Xiao, J., Nie, L., Shao, J., Liu, W., Chua, T.S.: Sca-cnn: Spa-
tial and channel-wise attention in convolutional networks for image captioning. In:
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
pp. 5659–5667 (2017)
6. Fan, Y., Yu, J., Liu, D., Huang, T.S.: Scale-wise convolution for image restoration
(2019)
7. Ignatov, A., Patel, J., Timofte, R.: Rendering natural camera bokeh effect with
deep learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision
and Pattern Recognition (CVPR) Workshops (June 2020)
8. Ignatov, A., Timofte, R., et al.: AIM 2020 challenge on learned image signal pro-
cessing pipeline. In: European Conference on Computer Vision Workshops (2020)
9. Ignatov, A., Van Gool, L., Timofte, R.: Replacing mobile camera isp with a single
deep learning model. arXiv preprint arXiv:2002.05509 (2020)
10. Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by
reducing internal covariate shift. arXiv preprint arXiv:1502.03167 (2015)
11. Liang, Z., Cai, J., Cao, Z., Zhang, L.: Cameranet: A two-stage framework for
effective camera isp learning (2019)
12. Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object
detection. In: Proceedings of the IEEE International Conference on Computer
Vision. pp. 2980–2988 (2017)
13. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic
segmentation. In: Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition. pp. 3431–3440 (2015)
14. Marnerides, D., Bashford-Rogers, T., Hatchett, J., Debattista, K.: Expandnet: A
deep convolutional neural network for high dynamic range expansion from low
dynamic range content. Comput. Graph. Forum 37(2), 37–49 (2017)
15. Nah, S., Son, S., Timofte, R., Lee, K.M.: Ntire 2020 challenge on image and video
deblurring. In: Proceedings of the IEEE/CVF Conference on Computer Vision and
Pattern Recognition (CVPR) Workshops (2020)
16. Purohit, K., Suin, M., Kandula, P., Ambasamudram, R.: Depth-guided dense
dynamic filtering network for bokeh effect rendering. In: 2019 IEEE/CVF Interna-
tional Conference on Computer Vision Workshop (ICCVW). pp. 3417–3426 (2019)
17. Ratnasingam, S.: Deep camera: A fully convolutional neural network for image
signal processing. In: Proceedings of the IEEE/CVF International Conference on
Computer Vision (ICCV) Workshops (2019)
18. Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: Towards real-time object detec-
tion with region proposal networks. In: Advances in Neural Information Processing
Systems. pp. 91–99 (2015)
19. Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomed-
ical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F.
(eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015).
https://doi.org/10.1007/978-3-319-24574-4 28
20. Sun, Y., Yu, Y., Wang, W.: Moiré photo restoration using multiresolution convo-
lutional neural networks. IEEE Trans. Image Process. 27(8), 4160–4172 (2018)
21. Tao, X., Gao, H., Shen, X., Wang, J., Jia, J.: Scale-recurrent network for deep
image deblurring. In: Proceedings of the IEEE Conference on Computer Vision
and Pattern Recognition (CVPR) (2018)
22. Ulyanov, D., Vedaldi, A., Lempitsky, V.: Instance normalization: The missing
ingredient for fast stylization. arXiv preprint arXiv:1607.08022 (2016)
23. Wang, X., et al.: Esrgan: Enhanced super-resolution generative adversarial net-
works. In: Proceedings of the European Conference on Computer Vision (ECCV)
Workshops (2018)
24. Yan, Q., et al.: Deep hdr imaging via a non-local network. IEEE Trans. Image
Process. 29, 4308–4322 (2020)
184 Y. Zhu et al.
25. Yuan, S., et al.: Aim 2019 challenge on image demoireing: Methods and results.
In: 2019 IEEE/CVF International Conference on Computer Vision Workshop
(ICCVW). pp. 3534–3545 (2019)
26. Zhang, K., Zuo, W., Chen, Y., Meng, D., Zhang, L.: Beyond a gaussian denoiser:
Residual learning of deep cnn for image denoising. IEEE Trans. Image Process.
26(7), 3142–3155 (2017)
27. Zhang, K., Zuo, W., Zhang, L.: Ffdnet: Toward a fast and flexible solution for
cnn-based image denoising. IEEE Trans. Image Process. 27(9), 4608–4622 (2018)

Eednet: Enhanced Encoder-Decoder Network For Autoisp: (Xiangyu - He, Jcheng) @NLPR - Ia.Ac - CN

Uploaded by

Copyright:

Available Formats

You might also like

Eednet: Enhanced Encoder-Decoder Network For Autoisp: (Xiangyu - He, Jcheng) @NLPR - Ia.Ac - CN

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Eednet: Enhanced Encoder-Decoder Network For Autoisp: (Xiangyu - He, Jcheng) @NLPR - Ia.Ac - CN

Uploaded by

Copyright:

Available Formats

EEDNet: Enhanced Encoder-Decoder

Network for AutoISP

Yu Zhu1 , Zhenyu Guo2,3 , Tian Liang2,3 , Xiangyu He2 , Chenghua Li2,4(B) ,

Keywords: ISP · RAW-to-RGB · LRF hypothesis · ClipL1

Y. Zhu, Z. Guo, T. Liang, X. He—Equal Contribution

2.1 CNN for Low-Level Vision

2.2 ISP Designing

3 Analysis of Receptive Field

Before introducing the network, we brieﬂy review the deﬁnition of receptive

Without loss of generality, we deﬁne the encoder’s bottom-up RF calculation

4.1 Network Design

Fig. 3. Overall structure of EEDNet.

Fig. 4. Loss comparison between L1 loss, L2 loss and ClipL1 loss

4.2 ClipL1 Loss

ClipL1 Loss is inspired by [12], which is designed to address class imbalance in

Models PSNR SSIM

We use Zurich RAW to RGB dataset in experiments, which is supplied by

5.2 Implementation Details

Training Process. Our method is implemented in PyTorch 1.5.0 and trained on

5.4 Ablation Studies

5.5 Smartphone-Image-Denoising Results

Huawei_RAW Huawei_ISP Canon_ISP/GT PyNET EEDNet(ours)

1st 2 nd 3rd 4th 5th 6th 7 th

Table 2. Comparison of U-net with diﬀerent receptive ﬁeld. Fh represents the RF of

Models Fh PSNR SSIM PSNR SSIM

Acknowledgements. This work was supported by the Advance Research Program

You might also like