Investigating Deep Optics Model Representation

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

Research Article Vol. 30, No.

20 / 26 Sep 2022 / Optics Express 36973

Investigating deep optics model representation


in affecting resolved all-in-focus image quality
and depth estimation fidelity
X IN L IU , 1,4 L INPEI L I , 1,4 X U L IU , 1 X IANG H AO, 1,2,5
AND Y IFAN P ENG 3,6,*
1 State Key Laboratory of Modern Optical Instrumentation, College of Optical Science and Engineering,
Zhejiang University, Hangzhou 310027, China
2 Intelligent Optics & Photonics Research Center, Jiaxing Research Institute Zhejiang University, Jiaxing
314000, China
3 Department of Electrical and Electronic Engineering, The University of Hong Kong, Pokfulam Rd., Hong
Kong SAR, China
4 Equal contributors
5 haox@zju.edu.cn
6 evanpeng@hku.hk
* evan.y.peng@gmail.com

Abstract: The end-to-end (E2E) optimization of optics and image processing, dubbed deep
optics, has renewed the state-of-the-art in various computer vision tasks. However, specifying
the proper model representation or parameterization of the optical elements remains elusive.
This article comprehensibly investigates three modeling hypotheses of the phase coded-aperture
imaging under a representative context of deep optics, joint all-in-focus (AiF) imaging and
monocular depth estimation (MDE). Specifically, we analyze the respective trade-off of these
models and provide insights into relevant domain-specific requirements, explore the connection
between the spatial feature of the point spread function (PSF) and the performance trade-off
between the AiF and MDE tasks, and discuss the model sensitivity to possible fabrication errors.
This study provides new prospects for future deep optics designs, particularly those aiming for
AiF and/or MDE.

© 2022 Optica Publishing Group under the terms of the Optica Open Access Publishing Agreement

1. Introduction
Deep optics, the encompassing field of jointly designing optics, sensor electronics, and processing
algorithms in an end-to-end (E2E) manner, has enhanced the performance of many image-related
measurement tasks [1], including spectral imaging [2–4], single-molecule localization microscopy
[5], extended depth of field [6], achromatic imaging [7], high dynamic range imaging [8], and
the combination of all-in-focus (AiF) imaging and monocular depth estimation (MDE) [9,10].
Of these tasks, high-quality AiF-MDE, in particular, has a substantial impact on many other
exciting fields, including robotics [11], autonomous driving [12], and augmented reality [13].
Conventional approaches relying on time-of-flight, stereo pairs, or structured illumination are
often accompanied by unwieldy, bulky hardware kits. Fortunately, many deep optics setups
simply use widely accessible lenses with a lightweight reinforcement, such as a diffractive optical
element (DOE) to code the aperture phase.
A general framework of deep optics designs can be illustrated as Fig. 1. The point spread
function (PSF) of the optical system encoded by the specified optics model is convolved with the
ground truth image which contains domain-specific information, such as the depth or spectrum
of the scene. The sensor data is formed by compressing the entire wave field into an RGB or
monochromatic image. These measurement data are then fed into a neural network, whose output
is the predicted counterparts of the ground truth information. The loss function evaluates the

#473084 https://doi.org/10.1364/OE.473084
Journal © 2022 Received 19 Aug 2022; revised 10 Sep 2022; accepted 12 Sep 2022; published 22 Sep 2022
Research Article Vol. 30, No. 20 / 26 Sep 2022 / Optics Express 36974

deviation between the ground truth information and the prediction. During the E2E optimization,
the backpropagation starts from the loss and continues along the entire pipeline. As such, the
parameters of both the optics model and the neural network can be updated iteratively. Once
the optimization converges, the optimal optical design and the corresponding image processing
network for a domain-specific task can be obtained.

Sensor Image Prediction All-in-Focus


Depth Map
Hyperspectrum
Loss
High Dynamic Range
Achromat
Neural Network
(Domain-specific Information) …

Ground Truth (a) (b) (c) ͋


PSF

޳
0
(Domain-specific Information) PW CR OAM
Optics
Model

Fig. 1. Illustration of a general deep optics framework and the phase texture visualization
of three investigated optics model representations. (a) PW model. (b) CR model. (c) OAM
model. The solid and dashed arrows denote the forward and backward propagation.

Notably, such designs generally involve considerations of computational efficiency, optimiza-


tion complexity, degree of freedom (DoF), fabrication feasibility, and calibration complexity.
Benefiting from emerging hardware and software technologies, computational efficiency is
becoming less concerned. The primary issue currently is that such optics models require a large
DoF for optimization, which is often constrained by fabrication limits of all kinds simultaneously
[8]. In search of an appropriate optics model for a specific visual task, one often seeks tricks on
expanding the DoF of design. However, even taking the fabrication constraints into consideration,
the extra introduced DoF may in turn result in notorious calibration hardship dealing with
fabrication and assembling errors in practice. To this point, a freer optics model, if not promising
a significant improvement, may not deserve the hassle of its fabrication and calibration and
is therefore much less valuable. This is counter-intuitive to the common understanding that
the higher DoF the better. Moreover, an intuitive yet simple-structured parameterization may
be beneficial for not only reducing optimization complexity but also improving computational
efficiency.
As a showcase of such a circumstance, in this article, we comprehensively compare three
phase-encoding parameterization types. Starting from an extreme case where the full DoF of a
DOE is warranted, we assume each of the graven pixels to be independent of each other, leading
to a pixel-wise (PW) parameterized phase distribution during optimization [Fig. 1(a)]. This
scheme has been widely used in state-of-the-arts [5,6]. However, the PW model shows very little
control over the local smoothness, prone to difficulties in making the exact design sustainable.
On the contrary, considering the fabrication feasibility of optics, rotational symmetric shapes are
the most favored type for centuries, starting with spherical lenses and later on aspherical ones
[14]. Their counterparts in the DOE sense, Fresnel lenses consisting of concentric rings, have
also been intensively investigated [15,16]. There is only one DoF being possessed in this type of
DOE design, which is the radius of curvature of the surface.
Notably, to enable a higher DoF, one can further relax the radial phase continuity constraint and
treat rings as independent variables, thereby, the concentric ring (CR) phase plates [Fig. 1(b)] have
been recently reported to solve AiF, MDE, and achromatic imaging problems [7,9,17,18]. Like
the conventional spherical optics, the CR model is robust to practical installation and calibration
Research Article Vol. 30, No. 20 / 26 Sep 2022 / Optics Express 36975

of lenses. Moreover, this symmetric characteristic can not only accelerate the diffraction modeling
by dimensionality reduction but also facilitate a relatively practical fabrication process.
Unfortunately, the radial discontinuity of CR can make sophisticated fabrication techniques
like lithography indispensable. In this point of view, this type of parameterization option exhibits
over-constrains and wastes the underlying DoF that lithography techniques can grant inside the
otherwise planar rings [19]. To extend the DoF of the CR type, adding extra features to the phase
distribution is often considered. Although various parameterizations are available, if treating
the phase distribution of each ring in the CR model as a zero-order polynomial of the azimuthal
angle, the DoF can be extended by intuitively introducing an extra first-order term, i.e., a vortex
shape [see Fig. 1(c)]. This term expands the DoF by a factor of two, providing more encoding
capability for modulating incoming waves subject to desired tasks. It is worth noting that this
parameterization is similar to the phase plate forming a helix-shaped PSF, which has been widely
used for depth estimation [20–22]. In addition, it simultaneously introduces the extra orbital
angular momentum (OAM) to photons transmitted from each ring [23], thus we term it the OAM
model in this work.
The fact that most imaging systems feature a circular aperture naturally raises the option of
using Zernike polynomials to parameterize the aperture wavefront [24], which have been widely
applied in deep optics [4,6,25]. The Zernike model differs from the aforementioned sub-aperture
parameterization method in that it is realized by the summation of a set of full-aperture orthogonal
and smooth bases. Although this model guarantees phase smoothness, it can be ambiguous to
specify the proper number and the type of bases from infinite ones. Importantly, for the target
AiF-MDE task, the Zernike model has shown sub-optimal performance compared with the CR
model, as indicated in the prior work [9]. Therefore, for brevity, we decide to exclude this model
from our investigation.
We adopt the AiF-MDE task to compare the three parameterization manners. As a representative
of deep optics applications, the AiF-MDE task contains inherent conflicts in its sub-goals. As
AiF prefers easily invertible PSF which tends to be depth-invariant [26], while the MDE task
tends to reserve depth-dependent features [24]. Fortunately, there remains a possibility that the
two tasks can be balanced well in an E2E scheme, especially when a powerful neural network is
involved [9]. As one of the greatest features of deep optics, the E2E optimization usually has
the ability to improve the general performance further for multi-objective optimization despite
inevitable trade-offs in the point of view of sequential, separated designs.
In this work, we investigate the impact of the parametric strategy on a deep optics model with
respect to their potential for balancing the AiF and MDE sub-goals and thereby, lifting their
overall performance. Firstly, to assess the performance of the optics models on this well-known
trade-off, we train three models (PW, CR, and OAM) with different weight assignments for the
AiF and MDE sub-goals and derive the corresponding metrics. The performance comparison
of the three optics models is presented with possible insights thoroughly analyzed. Then, we
characterize the PSFs of the three optics models and reveal the link between the PSF feature and
the trade-off behavior. At last, these optics models are assessed taking into account synthetic
fabrication errors.

2. Optics model representation


Although optical coding can be realized on various platforms, here we take the phase-only DOE
as an example considering its widespread use in E2E designs. The phase modulation induced by
a DOE can be described as
2π [n (λ) − nair ]
ϕ (u, v, λ) = h (u, v) , (1)
λ
Research Article Vol. 30, No. 20 / 26 Sep 2022 / Optics Express 36976

where (u, v) indicate the lateral coordinates of the pixels on the aperture, and λ is the wavelength
of the incident light. n is the refractive index of the DOE, and nair ≈ 1 is that of air, respectively.
h is the height profile of the DOE.
General criteria of optics model representation include the quality of convergence during
optimization, feasibility for fabrication, and the capability for information encoding. The most
straightforward type for numerical optimization is the PW parameterization [8], i.e., the height of
every pixel on the DOE is treated as a variable in the E2E optimization. Although this model
theoretically facilitates all possible phase modulations of a DOE, in practice, it can be problematic
to converge to a reasonably good result due to enormous parameters. Worse still, the optimized
DOE most likely features a rough height profile since each pixel is optimized independently,
resulting in fabrication challenges.
To decrease the number of parameters and alleviate possible fabrication complexity, recently,
a model constituted by a set of vanilla concentric rings, i.e., CR model, has been proposed,
manifesting great superiority in applications of AiF-MDE [9] and diffractive achromats [7]. The
phase coding of this model can be formulated by
M
∑︂
ϕ (r, λ) = bm (λ) [Circm (r) − Circm−1 (r)] , (2)
m=1

where r = u2 + v2 , Circm (r) = circ (r/rm ), and Circ0 (r) = 0. rm = md, m = 1, 2, . . . , M (d
is the width of each ring), and circ is the unit circ function. The phase shifts b of each ring
are the variables to optimize. Featuring a rotationally symmetric structure, the efficiency of
numerical computation can be tremendously enhanced by reducing the diffraction calculation
from two dimensions (2D) to 1D. In addition, this feature can simplify the fabrication and
hardware assemblage. However, its relatively simple structure in turn significantly limits the DoF.
The resulting PSF is restricted to a circular shape, which indicates that the information can be
encoded along only one spatial dimension.
In principle, depth information can be directly extracted from the depth-dependent shape of
PSF. One representative is the helix-shaped PSF [20–22], which features a depth-dependent
rotation. The corresponding phase profile is also constituted by a set of concentric rings, but
with spiral profiles rather than aclinic rings. Specifically, the phase modulation of each ring is lφ
(l is the topological charge, and φ is the azimuthal angle on the aperture plane). Therefore, the
incident light of each ring carries the OAM of lℏ per photon (ℏ is the reduced Planck constant).
Assuming a constant phase shift b to each of the rings extends the DoF without introducing
difficulty in manufacturing. This modification yields a generalized model of both the OAM and
the CR types, which can be expressed with similar notations as
M
∑︂
ϕ (r, φ, λ) = [lm (λ) φ + bm (λ)] [Circm (r) − Circm−1 (r)] . (3)
m=1

where lm is the topological charge induced by the m-th ring. Conventionally, it is defined as
an integer. However, here we set lm ∈ R because an integer lm is intractable for simultaneously
encoding multiple wavelengths.

3. Implementation and analysis


We establish an E2E pipeline and perform optimization threads with three genres of DOE models.
In brief, the PSFs of the optical system coded by the DOE are convolved with the ground truth
images segmented subject to the ground truth depth map. The sensor image is then simulated by
summing up the convolved segments after correcting occlusion at the transitional regions and
then pre-processed to form a depth-dependent image stack. This image stack along with the
Research Article Vol. 30, No. 20 / 26 Sep 2022 / Optics Express 36977

sensor image is fed into a convolutional neural network (CNN), whose output is an estimated
AiF image with its depth map. Specifically, we choose the U-Net as the architecture of the CNN,
as it is the most representative network in deep optics, especially for AiF-MDE. Details about the
pipeline, datasets, and hyperparameter settings can be found in Supplement Section 1.
For the quantification of the two dimensions—AiF image quality and depth estimation fidelity,
we use checkpoints of the validation step with the lowest loss of AiF images and depth maps,
denoted by C ′ and C ′′, respectively. Notably, checkpoints are snapshots of the model status at
the current optimization step, which store all values of the hyperparameters and optimizable
variables, including the profile of the DOE and each parameter of the CNN.

3.1. AiF-MDE trade-off analysis


First, we assess the performance of the three models with the weight ratio between the AiF
image and the depth map Rw = wRGB : wDepth = 1 : 1. The OAM model is optimized with two
different initialization manners, corresponding to the original double (OAMd ) and triple (OAMt )
helix PSFs, respectively. The PW and the CR ones are both initialized with zeros. The phase
distribution and PSFs of the initial status can be found in Supplement Fig. S2. Selected estimated
AiF images from C ′ and depth maps from C ′′ are shown in Fig. 2. The results are quantitatively
evaluated by the average peak signal-to-noise ratio (PSNR) of the AiF images and mean absolute
error (MAE) of the depth maps, respectively. We observe that the quality of the AiF images
reconstructed from the PW model is the best, while that from the OAMt model is the worst. In
contrast, the depth estimation fidelity of the four models exhibits a reversed trend.

Ground truth PW CR OAMং OAM঑


30.3 dB 27.8 dB 26.1 dB 25.5 dB

0.61 0.56 0.32 0.25

29.6 dB 27.6 dB 26.5 dB 26.2 dB

0.14 0.14 0.11 0.11

1m 5m

Fig. 2. Estimated AiF images and depth maps from four models configured with Rw = 1 : 1.
The 1st and 3rd rows show parts of AiF images from the validation set. The 2nd and 4th rows
are the corresponding depth maps. The image PSNR and the MAE of the depth maps are
indicated on the upper-left of each block. The MAE of the depth maps is measured in meters.
Note that the AiF images and the depth maps are resolved from saved checkpoints C ′ and
C ′′ , respectively.
Research Article Vol. 30, No. 20 / 26 Sep 2022 / Optics Express 36978

The average metrics across the whole validation set are listed in Table 1, which suggests a
consistent observation with Fig. 2 that the best and worst performances in AiF imaging are
possessed by the PW and OAMt models, respectively. However, the case for depth estimation
is generally, but not completely, reversed as shown in Fig. 2. We observe that the MAE of the
depth maps estimated from the PW model is smaller than that of the CR one. This may be due
to the different DoFs of these two models. As mentioned above, the PW model has the highest
DoF, however, interestingly, it did not possess the best performance in terms of both AiF imaging
and depth estimation. This may result from the high DoF as well, which likely leads to over
convergence on the AiF imaging with small but featureless PSFs under this weight combination.
Although the AiF image quality from the OAM models is worse than the PW model and the CR
one, the depth maps show better fidelity. The OAMd model shows better AiF image quality than
the OAMt one, but they have similar depth estimation performance.

Table 1. Quantitative assessment of optics models for AiF imaging and depth estimation
implemented with Rw = 1 : 1.
Optics models Image (PSNR) ↑ Depth (MAE) ↓
PW 29.80 0.056
CR 28.40 0.058
OAMd 26.52 0.054
OAMt 25.86 0.053

Since the captured image is determined by the PSF while the architecture of the CNN remains
unchanged, we analyzed the PSFs of the four models to further explore the substantial reasons
for the performance deviations. The phase distribution (at the principal wavelength of 545 nm)
of the optimized DOE models and the corresponding color PSFs are visualized in Fig. 3. The
superscripts ′ and ′′ of each model name denote the results obtained from the two checkpoints
C ′ and C ′′, respectively. The optimized phase distribution of the DOE and the corresponding
PSFs of each model obtained from C ′ and C ′′ are similar. This observation indicates that, in
C ′ and C ′′, the main difference is the parameters in the network, which results in the different
performance of AiF imaging and depth estimation during the training of each model. In the
four models, unsurprisingly, the phase distribution of the PW model is the roughest. However,
to our astonishment, the phase distribution converges to an almost concentric shape, as well
as the PSFs. For the CR model, the PSFs at each depth all exhibit similar circular shapes but
have depth-dependent color features, which can assist depth estimation. In contrast, the PSFs of
the two OAM models exhibit more depth-dependent shapes, although they are more diffused
compared with the PW and CR ones.
To further assess the performance in balancing the trade-off between AiF and MDE sub-goals,
we set a series of alternative weight ratios Rw ∈ {1 : 10, 1 : 5, 5 : 1, 10 : 1} in the loss function
and train the four models with these four weight ratios. Therefore, 20 independent optimization
threads are conducted, including the case of Rw = 1 : 1. We then collect all the checkpoints at the
end of every training epoch for a detailed analysis, since they can reveal the possibility/potential
of the performance of the optics models during the whole optimization process. The validation
metrics of each checkpoint are presented in Fig. 4. Note that those points closer to the top-left
corner represent a better overall performance since it is the direction along which both metrics
evolve towards optimal. The performance bounds and cloud pattern of the four models suggest
that, despite the rough surface profile, the PW model conquers most of the land in terms of
requiring better performance in both AiF and MDE. The CR model exhibits better overall
performance than the OAM ones, however, in the region boxed by the solid black line, the OAM
model is likely to obtain better performance compared to those boxed by the dashed black lines.
We conclude that, in most cases, the CR model is preferred for both AiF and MDE tasks. For
Research Article Vol. 30, No. 20 / 26 Sep 2022 / Optics Express 36979

0 ͋ 1 m 5m

PW'
PW"
CR'
CR"
OAMং'
OAMং"
OAM঑'
OAM঑"

Fig. 3. Optimized phase distribution at the principal wavelength of DOE models and
corresponding color PSFs along the target depth range of 1 m – 5 m. The 1st column presents
the optimized phase distribution of the DOE, and the 2nd − 7th columns are the corresponding
PSFs, which are uniformly sampled with the inverse perspective scheme [9] within the target
depth range (visualization of the full 16 PSFs can be found in the Supplement Fig. S3, S4).
For better visualization, the PSFs are cropped to the central 64 × 64 pixels and shown in
normalized amplitude.

applications where MDE is more concerned over AiF, the OAM one is likely the better choice
compared with the CR model, because it shows the potential to obtain a comparable MAE of
depth maps but a better PSNR of AiF images at the same time and vice versa. Nevertheless, this
superiority may be very weak to be preserved when affected by manufacturing error.
As noted before, the OAM model can theoretically degrade into the CR one. However, in
practice our studied OAMd and OAMt models did not converge to the CR one to obtain a better
overall performance. This may be attributed to the sub-optimal initialization manners where a
large l in each ring is necessary, indicating that it is far from converging to a CR-shaped one,
i.e., l → 0 in each ring. To validate this hypothesis, we have optimized another OAM model
with zeros initialization, i.e., all l = 0 and b = 0. The optimized results (Supplement 1, Fig.
S10) suggest that the parameter l in each ring may not be effective in promoting the AiF-MDE
performance as expected, since only a few rings carry noticeable vortex features.
Research Article Vol. 30, No. 20 / 26 Sep 2022 / Optics Express 36980

30

28

26
image (PSNR)
24

22 1:10
PW
1:5
CR
1:1
20 OAMং
5:1
OAM঑
10:1
18
0.036 0.066 0.12 0.219 0.4
depth (MAE)

Fig. 4. Validation metrics from each epoch of the 20 studied optimization clusters. The
depth MAE is shown in logarithmic coordinate for the visualization purpose. The colored
curves connect the outermost dots of each model, indicating the performance bounds. The
density distribution of the dots is visualized as colored clouds. The shape of the markers
represents the results obtained with different Rw (see Supplement 1, Fig. S5, for the split
view). The region bounded by the solid black box indicates where the OAM models
outperform the CR one, compared to those by dashed black boxes, irrespective of the PW
model. Each optimization thread takes about 33 hours, and the convergence during the
training can be found inSupplement 1, Fig. S6, S7. In addition, two groups of selected
predictions varying with the weight ratios can be found in Supplement Fig. S8, S9.

3.2. PSF characterization


Extending the qualitative analysis in Fig. 3 leads to a quantitative analysis of PSF energy
concentration and similarities between depth layers of the four models. The energy concentration
is evaluated by the proportion of energy around the central field of view—the circular area with
a radius of 6 pixels. The shape similarity of PSFs at different depth layers is evaluated by the
pairwise correlation coefficient. As shown in Fig. 5, when Rw becomes larger, i.e., the E2E
optimization boosts the AiF imaging over the depth estimation, the PSFs of all the four models
exhibit an overall increased energy concentration [Fig. 5(a)], as well as the shape similarity
[Fig. 5(b)]. Notably, lower shape similarity indicates more depth dependency. Moreover, the PSF
of the CR model possesses the highest energy concentration except for the PW model [Fig. 5(c)],
while the shape of PSFs of the two OAM models varies dramatically subject to depth [Fig. 5(d)].
The corresponding PSFs of the four models obtained from C ′ and C ′′ at 16 depth layers can be
found in Supplement 1, Fig. S3, S4. Combining this observation with Fig. 3 and Fig. 4, it is
indicated that the PSFs with higher energy concentration can likely benefit AiF imaging, while
those featuring intense depth-dependent shape can be more beneficial to depth estimation.

3.3. Robustness to fabrication error


Although simulation results have revealed the pros and cons of the three investigated optics
models, possible fabrication errors would inevitably distort the optical response. To this point,
we reasonably model the fabrication error by adding the Gaussian noise to the height maps of the
optimized DOEs with different standard deviations (SDs). Specifically, the SDs are defined as
the relative values of the maximum height hmax of DOE profiles, corresponding to the 2π phase
shift for the principal wavelength. Then, the performance of the optics models is assessed on the
Research Article Vol. 30, No. 20 / 26 Sep 2022 / Optics Express 36981

PW' CR' OAMং' OAM঑' C'


PW" CR" OAMং" OAM঑" C"
(a) 1 (b) 1

0.8
0.8
̗ 0.6 c
0.6
0.4

0.2 0.4
0.1 0.2 1 5 10 0.1 0.2 1 5 10
Rঔ Rঔ
(c) 1 (d) 1
̗ 0.8
mean
c 0.8
mean
0.6 0.6
0.4 0.4
PW CR OAMং OAM঑ PW CR OAMং OAM঑

Fig. 5. PSF characterization. (a) and (b) are the average energy concentration and similarity
of PSFs at 16 depth layers of the four models with different Rw . (c) and (d) are the further
averaged values of the four models by different Rw . The average energy concentration and
pairwise correlation coefficient are denoted by η and c, respectively.

validation set. The validation process is the same as that described in Section 3.1, except for the
difference in the height maps of DOEs.
We have performed five simulation trials for each model with different heightmap error levels.
To alleviate any random error, in each simulation trial, the corresponding validation result is the
average of five repeated validations with different random seeds of Gaussian noise. The resulting
image PSNR and depth MAE are illustrated in Fig. 6, under the weight ratio Rw = 1 : 1 of model
assessment. As shown in Fig. 6(a), with the increase of the fabrication error, the image PSNR
of all optics models exhibits a decreasing trend. Notably, in contrast to the other two models,
the PW model is intensively affected as expected. This is probably because the optimized DOE
possesses too many high-frequency features which are most likely to be corrupted during the
fabrication process, thereby its resulting PSF is more likely to deviate from the target distribution.

PW' CR' OAMং' OAM঑'


PW" CR" OAMং" OAM঑"
0.12
(a) (b)
30
0.1
image (PSNR)

depth (MAE)

28

0.08
26

24 0.06

22 0.04
0 0.01 0.03 0.05 0.1 0 0.01 0.03 0.05 0.1
SD of fabrication error (hঋ۪ۭ) SD of fabrication error (hঋ۪ۭ)

Fig. 6. Visualization of robustness to fabrication errors of investigated optics models.


(a) and (b) are image PSNR and depth MAE varied with the height map error of DOEs,
respectively. The level of the fabrication error is controlled by the SD relative to the
maximum height of DOE hmax .
Research Article Vol. 30, No. 20 / 26 Sep 2022 / Optics Express 36982

With respect to the depth estimation [Fig. 6(b)], the PW and CR models exhibit more sensitivity
than the OAM models. However, interestingly, the depth MAE of the PW and CR models,
especially the PW one, do not follow a monotonic trend like the image PSNR. This observation
indicates that a slight variation of the PSF may boost the performance of the depth estimation. A
possibility is that, at those assessed checkpoints, these optics models have not been optimized
to reach a reasonable local optimum for depth estimation. In addition, the CR model exhibits
weak robustness to the fabrication error for MDE. Thus, we conclude that the MDE may be more
sensitive to the deviation of circular PSFs.

4. Conclusion
Specifying an appropriate model parameterization of deep optics that can appropriately balance
the optimization efficiency and imaging performance is crucial yet challenging. Built upon a
representative domain-specific application, we have investigated the impact of DOE parame-
terization on AiF image quality and depth estimation fidelity. The conclusions are listed as
follows:

1. By comparing three representative models with varying weight ratios between AiF and
MDE subgoals, the PW model possesses the overall best performance in simulation.
However, it may be surpassed by the CR one in real-world applications due to fabrication
challenges induced by its rough surface profile. For applications where high-quality MDE
is a priority, although the OAM models show slightly better image PSNR while having the
same depth MAE as the CR one, and vice versa, this weak superiority is most likely to be
overwhelmed by fabrication errors. In addition, both the PW and OAM models may require
a complicated calibration for actual lens assembling especially when having multiple
elements, while the CR one can significantly simplify this process due to its rotationally
symmetric feature. Overall, it is indicated that, for the AiF-MDE task, near-optimal
performance could be obtained with the CR model in most cases, especially considering
its manufacturing feasibility and promising computational efficiency.
2. The results from the OAM model initialized with zeros indicate that a higher DoF of the
optics model does not always lead to better performance for specific tasks. This is still an
open question, where essential physical limitations of the optics model and optimization
challenges should be considered thoroughly.
3. A PSF with concentrated energy is more likely to benefit the AiF image reconstruction,
while a PSF with a depth-dependent shape is more suitable for MDE. Accordingly, we can
infer that a PSF possessing both high energy concentration and distinct depth-dependent
shape may boost the AiF imaging and MDE performance simultaneously. Nevertheless,
more advanced PSF engineering approaches are demanded.
4. The PW model most likely would suffer from more fabrication errors compared with the
other two models. Therefore, when parameterizing an optics model, the robustness to
fabrication deviations should get sufficient attention. Intuitively, one can add fabrication-
driven constraints like the quantization aware scheme [27] into the optimization, or specify
parameterization manners that are robust to this fabrication defect impact, such as the CR
or OAM models.

We believe these findings can provide insightful guidelines for future deep optics designs. Last,
although orthogonal to the technical scope of this investigation, we would like to note several
limitations. The three optics models aforementioned form a comprehensive comparison involving
two ends of extreme cases (PW and CR) and one in between (OAM). However, possibilities
between the two extreme ends are infinite, and an exhaustive comparison is impractical. We also
Research Article Vol. 30, No. 20 / 26 Sep 2022 / Optics Express 36983

note that although the Zernike model is not included in our comparison, it still should be often
considered for parameterizing the wavefront in deep optics. Moreover, most existing optical
elements, including the camera lenses and DOEs, as well as propagation methods, are modeled
with the paraxial approximation and spatial shift-invariance. In the non-paraxial case, the optics
models may demand a further assessment via more rigorous modeling manners [28,29]. In
addition, here only three principle wavelengths for the sensor response are considered, while
more spectral channels should be included to approach the real-world experimental condition.
Nevertheless, when questing for the breakthrough of the two limitations, a larger amount of
computational effort may be required.
Funding. National Key Research and Development Program of China (2018YFA0701400); National Natural Science
Foundation of China (92050115).
Acknowledgments. The authors gratefully acknowledge Gordon Wetzstein, Cindy Nguyen, Rui Wang, and Edmund
Y. Lam for fruitful discussion.
Disclosures. The authors declare no conflicts of interest.
Data availability. Data underlying the results presented in this paper are not publicly available at this time but may
be obtained from the authors upon reasonable request.
Supplemental document. See Supplement 1 for supporting content.

References
1. T. Klinghoffer, S. Somasundaram, K. Tiwary, and R. Raskar, “Physics vs. learned priors: Rethinking camera and
algorithm design for task-specific imaging,” arXiv preprint arXiv:2204.09871 (2022).
2. L. Wang, T. Zhang, Y. Fu, and H. Huang, “Hyperreconnet: Joint coded aperture optimization and image reconstruction
for compressive hyperspectral imaging,” IEEE Trans. on Image Process. 28(5), 2257–2270 (2019).
3. W. Zhang, H. Song, X. He, L. Huang, X. Zhang, J. Zheng, W. Shen, X. Hao, and X. Liu, “Deeply learned broadband
encoding stochastic hyperspectral imaging,” Light: Sci. Appl. 10(1), 108 (2021).
4. H. Arguello, S. Pinilla, Y. Peng, H. Ikoma, J. Bacca, and G. Wetzstein, “Shift-variant color-coded diffractive spectral
imaging system,” Optica 8(11), 1424–1434 (2021).
5. E. Nehme, D. Freedman, R. Gordon, B. Ferdman, L. E. Weiss, O. Alalouf, T. Naor, R. Orange, T. Michaeli, and Y.
Shechtman, “Deepstorm3d: dense 3d localization microscopy and psf design by deep learning,” Nat. Methods 17(7),
734–740 (2020).
6. V. Sitzmann, S. Diamond, Y. Peng, X. Dun, S. Boyd, W. Heidrich, F. Heide, and G. Wetzstein, “End-to-end
optimization of optics and image processing for achromatic extended depth of field and super-resolution imaging,”
ACM Trans. Graph. 37(4), 1–13 (2018).
7. X. Dun, H. Ikoma, G. Wetzstein, Z. Wang, X. Cheng, and Y. Peng, “Learned rotationally symmetric diffractive
achromat for full-spectrum computational imaging,” Optica 7(8), 913–922 (2020).
8. C. A. Metzler, H. Ikoma, Y. Peng, and G. Wetzstein, “Deep optics for single-shot high-dynamic-range imaging,” in
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2020), pp. 1375–1385.
9. H. Ikoma, C. M. Nguyen, C. A. Metzler, Y. Peng, and G. Wetzstein, “Depth from defocus with learned optics
for imaging and occlusion-aware depth estimation,” in 2021 IEEE International Conference on Computational
Photography (ICCP), (2021), pp. 1–12.
10. S.-H. Baek, H. Ikoma, D. S. Jeon, Y. Li, W. Heidrich, G. Wetzstein, and M. H. Kim, “Single-shot hyperspectral-depth
imaging with learned diffractive optics,” in Proceedings of the IEEE/CVF International Conference on Computer
Vision, (2021), pp. 2651–2660.
11. M. Ye, E. Johns, A. Handa, L. Zhang, P. Pratt, and G.-Z. Yang, “Self-supervised siamese learning on stereo image
pairs for depth estimation in robotic surgery,” arXiv preprint arXiv:1705.08260 (2017).
12. Y. Wang, W.-L. Chao, D. Garg, B. Hariharan, M. Campbell, and K. Q. Weinberger, “Pseudo-lidar from visual depth
estimation: Bridging the gap in 3d object detection for autonomous driving,” in Proceedings of the IEEE/CVF
Conference on Computer Vision and Pattern Recognition, (2019), pp. 8445–8453.
13. W. Lee, N. Park, and W. Woo, “Depth-assisted real-time 3d object detection for augmented reality,” in ICAT, vol. 11
(2011), pp. 126–132.
14. P. L. Ruben, “Design and use of mass-produced aspheres at kodak,” Appl. Opt. 24(11), 1682–1688 (1985).
15. Y. Peng, Q. Fu, H. Amata, S. Su, F. Heide, and W. Heidrich, “Computational imaging using lightweight diffractive-
refractive optics,” Opt. Express 23(24), 31393–31407 (2015).
16. A. Nikonorov, R. Skidanov, V. Fursov, M. Petrov, S. Bibikov, and Y. Yuzifovich, “Fresnel lens imaging with
post-capture image processing,” in 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops
(CVPRW), (2015), pp. 33–41.
17. S. Elmalem, R. Giryes, and E. Marom, “Learned phase coded aperture for the benefit of depth of field extension,”
Opt. Express 26(12), 15316–15331 (2018).
Research Article Vol. 30, No. 20 / 26 Sep 2022 / Optics Express 36984

18. H. Haim, S. Elmalem, R. Giryes, A. M. Bronstein, and E. Marom, “Depth estimation from a single image using deep
learned phase coded mask,” IEEE Trans. Computat. Imaging 4(3), 298–310 (2018).
19. Q. Fu, H. Amata, and W. Heidrich, “Etch-free additive lithographic fabrication methods for reflective and transmissive
micro-optics,” Opt. Express 29(22), 36886–36899 (2021).
20. S. R. P. Pavani and R. Piestun, “High-efficiency rotating point spread functions,” Opt. Express 16(5), 3484–3489
(2008).
21. S. R. P. Pavani, M. A. Thompson, J. S. Biteen, S. J. Lord, N. Liu, R. J. Twieg, R. Piestun, and W. E. Moerner,
“Three-dimensional, single-molecule fluorescence imaging beyond the diffraction limit by using a double-helix point
spread function,” Proc. Natl. Acad. Sci. 106(9), 2995–2999 (2009).
22. S. Prasad, “Rotating point spread function via pupil-phase engineering,” Opt. Lett. 38(4), 585–587 (2013).
23. X. Liu, Y. Peng, S. Tu, J. Guan, C. Kuang, X. Liu, and X. Hao, “Generation of arbitrary longitudinal polarization
vortices by pupil function manipulation,” Adv. Photonics Res. 2(1), 2000087 (2021).
24. Y. Shechtman, S. J. Sahl, A. S. Backer, and W. E. Moerner, “Optimal point spread function design for 3d imaging,”
Phys. Rev. Lett. 113(13), 133902 (2014).
25. Y. Wu, V. Boominathan, H. Chen, A. Sankaranarayanan, and A. Veeraraghavan, “Phasecam3d—learning phase masks
for passive single view depth estimation,” in 2019 IEEE International Conference on Computational Photography
(ICCP), (IEEE, 2019), pp. 1–12.
26. E. R. Dowski and W. T. Cathey, “Extended depth of field through wave-front coding,” Appl. Opt. 34(11), 1859–1866
(1995).
27. L. Li, L. Wang, W. Song, L. Zhang, Z. Xiong, and H. Huang, “Quantization-aware deep optics for diffractive snapshot
hyperspectral imaging,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,
(2022), pp. 19780–19789.
28. F. Wyrowski and M. Kuhn, “Introduction to field tracing,” J. Mod. Opt. 58(5-6), 449–466 (2011).
29. S. Schmidt, T. Tiess, S. Schröter, R. Hambach, M. Jäger, H. Bartelt, A. Tünnermann, and H. Gross, “Wave-optical
modeling beyond the thin-element-approximation,” Opt. Express 24(26), 30188–30200 (2016).

You might also like