Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

Measurement 190 (2022) 110663

Contents lists available at ScienceDirect

Measurement
journal homepage: www.elsevier.com/locate/measurement

Accurate 3D reconstruction via fringe-to-phase network


Hieu Nguyen a,b , Erin Novak a , Zhaoyang Wang a ,∗
a
Department of Mechanical Engineering, The Catholic University of America, Washington, DC 20064, USA
b
Neuroimaging Research Branch, National Institute on Drug Abuse, National Institutes of Health, Baltimore, MD 21224, USA

ARTICLE INFO ABSTRACT

Keywords: Learning 3D shape representations from structured-light images for 3D reconstructions has become popular in
Three-dimensional image acquisition many fields. This paper presents a new approach integrating a fringe-to-phase network with a fringe projection
Three-dimensional sensing profilometry (FPP) technique to achieve 3D reconstructions with superior accuracy and speed performance.
Single-shot imaging
The proposed fringe-to-phase network has a UNet-like architecture, capable of retrieving three wrapped phase
Structured light
maps directly from a color image comprising three fringe patterns with designated frequencies. Because the
Fringe-to-phase transformation
Convolutional neural network
phase maps contain the 3D shape representations of the measurement target, they serve as an intermediary to
Deep learning transform the single-shot fringe-pattern image into the 3D shapes of the target. The datasets with ground-truth
phase labels are generated by using a tri-frequency FPP method. Unlike the existing techniques, the proposed
approach yields both high-accuracy and fast-speed 3D reconstructions. Experiments have been accomplished
to validate the proposed technique, which provides a promising tool for numerous scientific research and
industrial applications.

1. Introduction known as the structured-light technique [5–7], where the imaging


system mainly comprises a projector and one or multiple cameras.
Single-shot 3D shape measurement and reconstruction, which is The projector emits structured patterns onto the surface of the target,
capable of measuring exterior dimensions of one or multiple objects and then the cameras capture the corresponding distorted patterns for
from a single 2D image, has been a popular topic of interest in numer- subsequent 3D reconstruction process. While an advanced structured-
ous scientific and industrial applications. A preeminent advantage of light technique typically yields high accuracy of measurement [8–11],
the single-shot 3D shape measurement is the real-time and high-speed
the measurement speed is often limited.
characteristic since it only requires capturing a single image. In the
In the last decade, academic and industrial research have witnessed
meantime, accuracy is another essential criterion to characterize the
the successful integration and employment of deep learning, especially
performance of a 3D shape measurement and reconstruction technique.
Although it is desired for a 3D reconstruction technique to possess both convolutional neural networks (CNNs), in many fields. The CNN is
characteristics, there is often a trade-off between accuracy and speed especially useful to computer vision applications such as object de-
in practice. The single-shot 3D shape measurement and reconstruction tection, pattern recognition, image translation, image segmentation,
is no exception. This is because high measurement accuracy typically scene understanding, and so on [12–14]. In the optics and exper-
requires an input of multiple images, whereas fast measurement speed imental mechanics fields where computer-vision-assisted techniques
normally demands using fewer images. Consequently, increasing the are adopted to carry out measurements of physical quantities, the
measurement accuracy while maintaining the measurement speed is of CNNs have been employed in various applications, including fringe
great interest in the research and development of a single-shot 3D shape analysis, phase retrieval, interferogram denoising, deformation mea-
measurement and reconstruction technique. surement, shape determination, etc. [15–18]. Particularly, in the 3D
A widely used scheme to enhance the accuracy of vision-based 3D shape measurement and reconstruction techniques, deep learning has
shape measurement and reconstruction techniques is projecting texture found its path to improve the measurement performance based on
patterns onto the surface of the target to facilitate the retrieval of the 3D
supervised and unsupervised learning processes. The deep learning-
geometric information. Consumer-grade RGB-D sensors [1–4] such as
based single-shot 3D reconstruction techniques mainly fall into two
Microsoft Kinect, Apple TrueDepth camera, Asus Xtion, Intel RealSense,
categories based on the input: a natural image or a structured-light
and Samsung uDepth are some recent representative products that
adopt this facilitation process. The texture projection scheme is well image. In the former group, Laina et al. [19] improves the depth map

∗ Corresponding author.
E-mail address: wangz@cua.edu (Z. Wang).

https://doi.org/10.1016/j.measurement.2021.110663
Received 5 July 2021; Received in revised form 2 December 2021; Accepted 23 December 2021
Available online 12 January 2022
0263-2241/© 2021 Published by Elsevier Ltd.
H. Nguyen et al. Measurement 190 (2022) 110663

Fig. 1. Pipeline of recent 3D reconstruction approaches integrating structured light and deep learning. (a) fringe-to-depth strategy, and (b) fringe-to-phase strategy.

estimation from a single RGB image by using a fully convolutional instinctively to envisage that integrating with the deep learning scheme
architecture combined with residual learning. Some works [20,21] can help the FPP technique substantially cut down the number of
were able to determine the depth information of general scenes from a images required. Ideally, only a single image is needed. Recently,
single image using deep convolutional networks. Furthermore, several numerous research efforts have been devoted to this topic. At present,
refinement processes have been undertaken by the CNN models after the deep learning-based FPP 3D shape measurement methods use either
the reconstruction of 3D geometry from a single RGB image [22]. In the a fringe-to-depth strategy or a fringe-to-phase strategy. The fringe-to-
second category, the structured-light techniques with the advantage of depth strategy often involves using an autoencoder-based network to
forming features over textureless regions have also received noteworthy transform a single-shot fringe image to its corresponding 3D depth
support from deep learning, which helps enhance the measurement per- map directly [30–33]. A pipeline of the fringe-to-depth strategy is
formance [23,24]. However, most of the current single-shot 3D shape illustrated in Fig. 1(a). Although the fringe-to-depth strategy provides
measurement and reconstruction techniques involving deep learning straightforward end-to-end processing, the performance of shape mea-
use datasets generated by commercial RGB-D sensors [25,26]. The main surement has been shown clearly inferior to that of the state-of-the-art
problem with these types of sensors is that they lack to provide a high- conventional FPP technique in terms of accuracy.
accuracy depth map and, in turn, harm the abilities of a supervised On the other hand, the fringe-to-phase strategy uses neural networks
learning model. to transform a fringe pattern into certain intermediate output, from
To overcome the challenging issues of preparing high-accuracy which subsequent processing is conducted to obtain the unwrapped
dataset labels, a commonly used structured-light technique named phase and 3D shape information [34–36]. It has been shown that
fringe projection profilometry (FPP) has been utilized by some re- compared with the direct fringe-to-depth scheme, such a two-stage
searchers [27–30]. The FPP technique is one of the most established approach can obtain improved results [37–40]. Fig. 1(b) demonstrates
and widely used 3D shape measurement and reconstruction techniques a variety of intermediate outputs that can be produced and poten-
owing to its high accuracy, as witnessed by numerous industrial 3D tially useful for the 3D shape reconstruction. In some cases, multiple
scanners. A major drawback of the FPP technique is that it relies on us- neural networks can be employed. For example, Feng et al. [41] uses
ing a series of fringe images to achieve accurate 3D shape measurement. two sub-networks to transform a fringe image into the numerator
Therefore, the FPP technique is suitable for preparing ground-truth and denominator of an arctangent function in phase calculation. The
datasets, but it is unfit for fast measurements. Because the multiple methods presented in [42,43] determine the absolute phase map from
images demanded by the FPP measurement are very similar, it is two separated networks where the first network extracts the numerator

2
H. Nguyen et al. Measurement 190 (2022) 110663

and denominator from a single fringe pattern and the second network
determines the integer fringe orders. Technically, however, it is of
practical interest to use a single network to get the desired intermediate
result from a single image.
Recently, Hu et al. [44] demonstrated that a wrapped phase map
can be predicted from a single grayscale fringe pattern by a deep-
learning network. Nevertheless, the method alone is insufficient for the
3D reconstruction of a target with geometric discontinuities because
of phase ambiguities encountered in phase unwrapping. A solution to
the phase ambiguity problem is to use tri-frequency fringe patterns
and encode them into the R, G, and B channels of a color image [45].
Following this idea, Feng et al. [46] and Qian et al. [47] proposed using
a network to transform three fringe patterns with different frequencies
into six numerators and denominators, which can be used later to
obtain wrapped and unwrapped phase maps. Nguyen et al. [48] used
a deep-learning network and a color structured-light image to create
multiple tri-frequency phase-shifted images.
Considering the robustness, advantages, and limitations of the
fringe-to-depth and fringe-to-phase strategies, an approach of using a
single end-to-end neural network to determine the phase distribution
map from a single structured-light fringe image should be achievable. Fig. 2. Illustration of the proposed 3D shape measurement and reconstruction system.
Like the image-to-image transformation using deep learning in the (a) system setup, (b) raw structured-light pattern, and (c) captured single-shot image.
computer vision field, a single color fringe image and its corresponding
wrapped phase maps can serve as a pair of input and output feature
vectors for the end-to-end fringe-to-phase network. With numerous 2. Methods
parameters, the fringe-to-phase network can be trained as a regressor
to predict the phase distributions directly from a single fringe image. The proposed fringe-to-phase approach desires to extract the phase
To prepare the training datasets for the network model, we employ distributions from a single-shot structured-light image using a deep
a tri-frequency phase-shifted FPP technique. Moreover, the proposed learning network. The supervised learning algorithm depends on us-
approach is considerably more straightforward than the state-of-the-art ing high-accuracy ground-truth labels to ensure a reliable learning
conventional techniques that highly rely on camera calibration, image operation and enhanced performance in real applications. Unlike the
registration or matching, and geometric triangular computation. 2D-to-3D datasets captured by RGB-D sensors and those synthesized by
Compared with the previous and recent work of using fringe-to- computer-aided design, the proposed approach employs the aforemen-
depth or fringe-to-phase strategy for 3D shape reconstruction, the key tioned FPP technique to prepare the demanded high-accuracy ground-
contributions of the proposed technique in this paper include: truth 3D labels. Specifically, a series of fringe-pattern images are cap-
tured for each data sample to generate two types of datasets: fringe-to-
1. An artificial neural network scheme integrating fringe projection phase and fringe-to-depth datasets. The latter datasets are for compar-
and deep learning is proposed to transform a single fringe image ison purposes only.
into three wrapped phase distribution maps.
2. It uses a single end-to-end neural network for phase determina- 2.1. FPP technique for generating phase datasets and subsequent 3D recon-
tion instead of multi-networks. struction
3. The network directly builds the wrapped phase maps of three dif-
ferent frequencies from a single color image without additional The FPP-based 3D shape measurement and reconstruction system
inputs. usually consists of two components: a camera and a projector. Fig. 2
4. The output of tri-frequency phase maps can handle the geo- illustrates the schematic of such a system, and it is employed by
metric discontinuities encountered in numerous applications to the proposed single-shot 3D shape measurement and reconstruction.
generate a full-field, truly unwrapped phase map. Fundamentally, the projector can be treated as a reversed camera,
5. It directly replaces the first essential step, i.e., wrapped phase so the system is similar to a stereo vision system. In practice, the
calculation, in the state-of-the-art conventional fringe projection FPP algorithms have considerable differences from the stereo vision
profilometry method, so it inherently and straightly fits into algorithms. During measurement and reconstruction, the projector il-
the existing fringe projection technique. In addition, outputting luminates the user-determined structured-light patterns, i.e., fringe
wrapped phase distributions other than unwrapped ones can patterns, on the object’s surface. Meanwhile, the synchronized camera
avoid handling complications. captures the distorted patterns. The captured fringe-pattern images
6. It yields 3D shape reconstruction with accuracy higher than the are then analyzed to obtain the 3D depth map following a process
fringe-to-depth network approach as well as other fast stereo including three main steps: (1) phase extraction, (2) phase unwrapping,
vision methods such as digital image correlation and those em- and (3) depth determination. The first step typically requires capturing
ployed by commercial RGB-D sensors. over 10 images (e.g., 12), making real-time and high-speed shape
measurements impossible. The basic idea of the proposed technique is
The remaining sections of the paper are organized as follows. Sec- to replace the first step with a deep learning approach so that only
tion 2.1 describes the FPP technique for preparing the high-accuracy a single image is required. The other two steps remain the same for
phase map labels and the algorithm of 3D reconstruction from phase assured accuracy. With this said, the adopted FPP-based 3D shape
distributions. Section 2.2 details the proposed fringe-to-phase network measurement and reconstruction technique is introduced as follows. It
architecture. Section 3 presents a number of experimental results with is noted that although the proposed approach requires only a single
accuracy assessment. Finally, a summary and discussion section are image, the deep learning part depends on the conventional multi-image
concluded in Section 4. algorithms to generate the training datasets.

3
H. Nguyen et al. Measurement 190 (2022) 110663

Fig. 3. Flowchart of the FPP technique.

To facilitate the process of phase determination in the conven- ( )


tional algorithms, the proposed approach employs a tri-frequency four- 𝜙1−2−3 (𝑓3 − 𝑓2 ) − 𝜙𝑤
2−3
𝜙2−3 = 𝜙𝑤
2−3
+ INT 2𝜋 (4d)
step (TFFS) phase-shifting scheme. With such a scheme, the original 2𝜋
fringe patterns emitted from the projector are generated by using the
⎛ 𝜙2−3 𝑓3 − 𝜙𝑤 ⎞
following function [49–52]: ⎜ 𝑓3 −𝑓2 3
⎟ 2𝜋
𝜙 = 𝜙3 = 𝜙𝑤 + INT (4e)
[ ( )] 3 ⎜ 2𝜋 ⎟
𝐼𝑖𝑗(𝑒) (𝑢, 𝑣) = 𝐼0(𝑒) 1 + cos 𝜙𝑗 (𝑢) + 𝛿𝑖 (1) ⎝ ⎠

In the equation, 𝐼 (𝑒) is the intensity of the emitted pattern at the pixel where 𝜙 with and without superscript 𝑤 are the wrapped phase and
coordinate (𝑢, 𝑣); subscripts 𝑖 and 𝑗 denote the 𝑖th phase-shifted image unwrapped phase at (𝑢, 𝑣), respectively; INT is the function of rounding
and the 𝑗th frequency, respectively, with 𝑖 = {1, 2, 3, 4} and 𝑗 = {1, 2, 3}; to the nearest integer; 𝜙𝑗−𝑘 is the subtraction of 𝜙𝑗 from 𝜙𝑘 , with
𝐼0(𝑒) is a constant intensity of fringe modulation and is often set to 127.5; (𝑓𝑘 − 𝑓𝑗 ) wrapped fringes in the phase map. The fundamental of the
𝛿 is the phase-shift amount with 𝛿𝑖 = (𝑖−1)𝜋 2
; and 𝜙 is the fringe phase algorithm is that 𝜙1−2−3 is both wrapped and unwrapped because there
with 𝜙𝑗 (𝑢) = 2𝜋𝑓𝑗 𝑊𝑢 , where 𝑓 is the total number of fringes in each is only one fringe in the pattern, and this allows using a hierarchical
pattern with 𝑓 = {61, 70, 80} and 𝑊 is the width of the generated phase-unwrapping process to bridge 𝜙1−2−3 and 𝜙3 through 𝜙2−3 . The
image. It can be seen from the equation that there are totally 12 phase distributions of the highest-frequency fringe patterns, 𝜙3 , is
generated images, and they are evenly-spaced vertical sinusoidal fringe adopted in the final phase determination because it yields the highest
patterns. accuracy.
After being captured by the synchronized camera, the fringe pat- The rationale for choosing fringe numbers, 𝑓1 = 61, 𝑓2 = 70, and
terns in the captured images are normally distorted by the geometric 𝑓3 = 80, is as follows: in the FPP technique, the period of fringes
shape of the measurement target. They can be described as: in the original images is usually between 8 and 20 pixels. A value
[ ] beyond this range generally leads to reduced accuracy in practice. The
𝐼𝑖𝑗 (𝑢, 𝑣) = 𝐼𝑚 (𝑢, 𝑣) + 𝐼𝑎 (𝑢, 𝑣) cos 𝜙𝑗 (𝑢, 𝑣) + 𝛿𝑖 (2)
horizontal resolution of the projector used in this work is 1280 pixels,
where 𝐼𝑚 , 𝐼𝑎 , and 𝐼 are the mean or background intensity, fringe and a selected primary period of 16 pixels yields 80 fringes in the
amplitude, and image intensity of the captured patterns at pixel co- image. With the hierarchical phase unwrapping algorithm shown in
ordinate (𝑢, 𝑣), respectively. The phase 𝜙 in the captured images is now Eq. (4), the combination of adopted fringe numbers provides a roughly
a function of both 𝑢 and 𝑣, and it can be determined as: uniform hierarchical ratio of 1:10:80 (note: 𝑓1−2 = 𝑓2 − 𝑓1 = 9,
𝐼4𝑗 − 𝐼2𝑗 𝑓2−3 = 𝑓3 − 𝑓2 = 10, 𝑓1−2−3 = 𝑓2−3 − 𝑓1−2 = 1; 𝑓1−2−3 , 𝑓2−3 , and 𝑓3
𝜙𝑤
𝑗 = arctan (3)
𝐼1𝑗 − 𝐼3𝑗 are used in phase unwrapping) to ensure reliable processing. It is better
From now on, the pixel coordinate (𝑢, 𝑣) will be omitted from the than, for instance, a ratio of 1:4:80 produced by fringe numbers 73, 76,
equations for simplification. In Eq. (3), the phase 𝜙 is wrapped be- and 80 because the latter gives an imbalanced ratio of 1:4 and 4:80. As
cause the arctangent function yields an output range of [−𝜋, 𝜋], so a mentioned previously, the 80-fringe pattern determines the final phase,
superscript 𝑤 is added to 𝜙. While it is generally impossible to get the and the 61- and 70-fringe patterns simply help determine the integer
unwrapped phase map solely from a single fringe image if the target has fringe orders during phase unwrapping.
geometric discontinuities and there are multiple fringes on the target, It is important to point out that the fringe-to-phase network intro-
the challenging phase-unwrapping task can be well coped with the duced in the following section aims to output three wrapped phase
TFFS phase-shifting algorithm as follows [48–50]: distributions, 𝜙𝑤1
, 𝜙𝑤
2
, and 𝜙𝑤
3
, using a single color image as its input.
{ Like the conventional technique, the proposed one depends on Eq. (4)
0 𝜙𝑤 ⩾ 𝜙𝑤
𝜙𝑤 = 𝜙𝑤
− 𝜙 𝑤
+ 2 1 (4a) to obtain the unwrapped phase map. Refraining from outputting the
1−2 2 1 2𝜋 𝜙𝑤
2
< 𝜙𝑤1
{ unwrapped phase allows the network to dedicate deep learning to an
0 𝜙𝑤 ⩾ 𝜙𝑤 accuracy-enhanced determination of wrapped phases.
𝜙𝑤 = 𝜙𝑤 − 𝜙𝑤 + 3 2 (4b)
2−3 3 2 2𝜋 𝜙3 < 𝜙𝑤
𝑤
2 The height or depth map of the measurement target can be cal-
{
0 𝜙𝑤 ⩾ 𝜙𝑤 culated using the geometric triangulation information. In practice,
𝜙1−2−3 = 𝜙𝑤 − 𝜙𝑤 + 2−3 1−2 (4c) the calculation can be implemented in an alternative and fast way
2−3 1−2 2𝜋 𝜙𝑤2−3
< 𝜙𝑤
1−2

4
H. Nguyen et al. Measurement 190 (2022) 110663

Fig. 4. Demonstration of the input and output pair in datasets. (a) fringe-to-phase datasets; (b) fringe-to-depth datasets.

as [51–53]: The first projection image is a color composite of three original fringe
[ ][ ]⊺ patterns with frequencies 61, 70, and 80, respectively, and they are
𝒄 𝒑⊗𝜱
𝑧 = [ ][ ]⊺ loaded in the RGB channels of the image. The rest 12 projection images
𝒅 𝒑⊗𝜱 are the previously mentioned TFFS phase-shifted grayscale images.
[ ]
𝒄 = 1 𝑐1 𝑐2 𝑐3 ⋅ ⋅ ⋅ 𝑐28 𝑐29 Accordingly, the first captured color image serves as the input image,
[ ] (5) and the following 12 grayscale images are employed to generate the
𝒅 = 𝑑0 𝑑1 𝑑2 𝑑3 ⋅ ⋅ ⋅ 𝑑28 𝑑29
[ ] corresponding ground-truth labels, i.e., three wrapped phase maps for
𝒑 = 1 𝑢 𝑣 𝑢2 𝑢𝑣 𝑣2 𝑢3 𝑢2 𝑣 𝑢𝑣2 𝑣3 𝑢4 𝑢3 𝑣 𝑢2 𝑣2 𝑢𝑣3 𝑣4 each output data, with Eq. (3). It is noteworthy that the conventional
[ ]
𝜱= 1 𝜙 FPP technique must rely on using non-composite grayscale images to
where 𝑧 is the physical depth/height or 𝑧-coordinate at the point obtain high accuracy because using an RGB image to get three grayscale
being imaged as pixel (𝑢, 𝑣) in the captured image; 𝒄 and 𝒅 contain images often results in notable noise in practice. In contrast, it has been
59 constant coefficients of the measurement system; ⊗ represents a shown that using RGB channels of color images works well in deep
Kronecker product, 𝜙 is the unwrapped phase at pixel (𝑢, 𝑣), determined learning work.
from Eq. (4). System parameters 𝑐1 − 𝑐29 and 𝑑0 − 𝑑29 can be determined Two types of datasets can be generated: one is the fringe-to-phase
by a calibration process in advance [54]. After obtaining 𝑧, the other dataset where the input and output are the single color fringe image
two coordinates 𝑥 and 𝑦 can be easily calculated when the intrinsic and the wrapped phase maps; another is the fringe-to-depth dataset
parameters of the camera is known. Therefore, the four terms of height where the single color fringe image and the depth map are the input
measurement, depth measurement, 3D shape measurement, and 3D and output, respectively. The generation of the fringe-to-depth datasets
reconstruction can often be used interchangeably. involves using Eqs. (3)–(5), and they are prepared for performance
Fig. 3 illustrates the flowchart of the FPP technique. The first comparison with the proposed approach. Fig. 4(a)–(b) displays some
portion, i.e., wrapped phase determination, is replaced by a fringe- exemplars of the pair of inputs and outputs in the fringe-to-phase and
to-phase network in the proposed approach. The rest portion, from fringe-to-depth datasets, respectively. Totally, 1500 data samples are
the wrapped phase to the depth map, remains the same for both the prepared. To ensure reliable network convergence and avoid biased
conventional and the proposed techniques. Such a scheme can preserve evaluation, the samples are split by a ratio of 80%-10%-10% as the
the 3D reconstruction accuracy while increasing the overall speed. training, validation, and test datasets. Moreover, the test dataset is
For the generation of the datasets, tens of small sculptures with selected in such a way that the objects in it never appear in other
various sizes and shapes are chosen as object samples. In the supervised datasets.
deep learning process, massive datasets are required to ensure reliable
learning and network tuning. Therefore, each sculpture is arbitrarily 2.2. Fringe-to-phase network
positioned and oriented several times in the system view to serve as
a large number of different samples. Besides capturing a single object The proposed fringe-to-phase network is similar to an image-to-
every time, the system also captures multiple objects randomly grouped image or pattern-to-pattern transformation network. The data structure
and positioned together in many different ways to enlarge the datasets. of the input and output in the datasets is a four-dimensional array
of shape [𝑠, ℎ, 𝑤, 𝑐], where the four variables are the number of data
During image capture, the system projects 13 patterns on each samples, the resolutions of the input image or output phase map in
target sample and synchronously captures 13 corresponding images. height and width, and the channel depth, respectively. Specifically, 𝑐 =

5
H. Nguyen et al. Measurement 190 (2022) 110663

Fig. 5. Architecture of the proposed fringe-to-phase network.

3 since the color image input has three RGB channels and the network where 𝐱 and 𝐟 are the input and output vectors, respectively; 𝑔 in-
learns to extract wrapped phase distributions from each channel. dicates an element-wise activation function; 𝐰 is a matrix of weight
The transformation network comprises two branches: an encoder parameters; 𝐛 is a vector of bias parameters; and 𝛼 is a negative slope
branch mapping the input into feature code and a decoder branch coefficient. The learning process involves finding the best parameters 𝐰
mapping the feature code to the output. Like many other autoencoder- and 𝐛 to minimize the loss, which represents the difference between the
based networks, the encoder and decoder branches consist of a few 2D actual network output and the desired output. In the proposed network,
3 × 3 convolutions, 2 × 2 max-pooling operations, and corresponding the adopted activation function is named leaky rectified linear unit
transposed convolutions as well as the symmetrical concatenations to (LeakyReLU). It is applied over the hidden layers to avoid the vanishing
enable precise data transformation for accurate phase representation gradient and zero-gradient problems [56] often faced by other func-
learning. In the network model, specifically, the encoder branch has tions such as the rectified linear unit (ReLU) and the hyperbolic tangent
ten 2D convolution layers and four max-pooling layers. These layers (Tanh). Additionally, a dropout regularization is applied to the desired
fulfill the geometric feature transformations by enlarging the channel layer to prevent the possible overfitting issue. Details on the training
depths and decreasing spatial resolutions to extract valuable represen- progress and hyperparameter tuning will be detailed in the following
tations from the input arrays. The decoder branch, on the contrary, is section.
composed of four transposed convolution layers and eight normal con-
volution layers. The decoder layers gradually magnify the input feature 3. Experiments
maps extracted from the encoder branch to higher resolution output
maps while reducing the channel depths. Similar to the well-known A number of qualitative and quantitative experiments have been
UNet, the network does not include fully connected layers. In each implemented to validate the effectiveness, capability, and robustness
convolution layer, the padding strategy is used to add additional rows of the proposed fringe-to-phase approach for 3D shape measurement
and columns to the border of the input arrays to keep the resolution of and reconstruction. In the experiments, the dataset images are captured
the response feature maps equal to the input one. by using an RVBUST RVC-X mini 3D camera that can capture each
set of the required 13 images in less than 0.2 s. To accommodate
Furthermore, a convolution operation with a kernel size of 1 × 1
the memory constraints in the training, the image resolution has been
with three filters and a sigmoid function is connected to the end of
reduced to 640 × 480 pixels. The key hardware units and components
the decoder branch to convert the feature arrays to the corresponding
used in the experiments include an Intel Xeon Gold 6140 2.3 GHz CPU,
wrapped phase outputs. Since the range of the wrapped phase is (−𝜋, 𝜋)
128 GB RAM, and two Nvidia Tesla V100 SXM2 32 GB graphics cards.
and can be easily converted to or from a range of (0, 1), a sigmoid
Moreover, Nvidia CUDA Toolkit 11.0 and cuDNN v8.0.3 are adopted to
function other than the regular linear activation function is used. In
enable the preeminence of the above hardware system. Tensorflow, an
addition, because the fringe-to-phase transformation is a regression
open-source platform with comprehensive tools for deep learning, and
process, one of the most common regression functions, mean squared
Keras, a free and user-friendly application programming interface for
error (MSE), is selected as the loss function for optimizing the model
Tensorflow, are chosen for the network construction and subsequent
parameters and monitoring the network convergence. Fig. 5 illustrates
learning tasks.
the structure of the proposed fringe-to-phase network. Additionally,
Table 1 describes the network architecture layer by layer and lists the
3.1. Hyperparameter tuning
total number of parameters.
In the optimization process of supervised learning, a backpropaga- In the supervised learning progress, not only must the weight and
tion algorithm is utilized to minimize the loss function associated with bias parameters of the neural networks be optimized, but the hyper-
the training and validation datasets through optimizing the network parameters also need to be tuned for efficient training. The intensities
parameters after each epoch. The backpropagation computation is per- of each channel of the color image input are rescaled to a range of
formed on top of an affine transformation, where each output unit can 0 to 1 to improve the training convergence and stabilize the learning
be defined as [55,56]: process. In the proposed work, a simple grid searching process is
{ ⊺
𝐰 𝐱+𝐛 𝐰⊺ 𝐱 + 𝐛 > 0 utilized to find the proper hyperparameter mixture. Overall, the fringe-
𝐟 = 𝑔(𝐰⊺ 𝐱 + 𝐛) = (6) to-phase network is optimized through 200 epochs with a batch size
𝛼(𝐰⊺ 𝐱 + 𝐛) 𝐰⊺ 𝐱 + 𝐛 ≤ 0

6
H. Nguyen et al. Measurement 190 (2022) 110663

Table 1
The proposed fringe-to-phase network architecture and layer parameters.
Layer Filters Kernel size/ Stride Activation Output size Params
Pool size
input – – – – 480 × 640 × 3 0
conv 32 3 × 3 1 LeakyReLU 480 × 640 × 32 896
conv 32 3 × 3 1 LeakyReLU 480 × 640 × 32 9,248
max pool – 2 × 2 2 – 240 × 320 × 32 0
conv 64 3 × 3 1 LeakyReLU 240 × 320 × 64 18,496
conv 64 3 × 3 1 LeakyReLU 240 × 320 × 64 36,928
max pool – 2 × 2 2 – 120 × 160 × 64 0

Encoder conv 128 3 × 3 1 LeakyReLU 120 × 160 × 128 73,856


conv 128 3 × 3 1 LeakyReLU 120 × 160 × 128 147,584
max pool – 2 × 2 2 – 60 × 80 × 128 0
conv 256 3 × 3 1 LeakyReLU 60 × 80 × 256 295,168
conv 256 3 × 3 1 LeakyReLU 60 × 80 × 256 590,080
max pool – 2 × 2 2 – 30 × 40 × 256 0
conv 512 3 × 3 1 LeakyReLU 30 × 40 × 512 1,180,160
conv 512 3 × 3 1 LeakyReLU 30 × 40 × 512 2,359,808
dropout – – – – 30 × 40 × 512 0
transpose conv 256 3 × 3 2 – 60 × 80 × 256 524,544
concat conv – – – – 60 × 80 × 512 0
conv 256 3 × 3 1 LeakyReLU 60 × 80 × 256 1,179,904
256 3 × 3 1 LeakyReLU 60 × 80 × 256 590,080
transpose conv 128 3 × 3 2 – 120 × 160 × 128 131,200
concat – – – – 120 × 160 × 256 0
conv 128 3 × 3 1 LeakyReLU 120 × 160 × 128 295,040
conv 128 3 × 3 1 LeakyReLU 120 × 160 × 128 147,584
Decoder transpose conv 64 3 × 3 2 – 240 × 320 × 64 32,832
concat – – – – 240 × 320 × 128 0
conv 64 3 × 3 1 LeakyReLU 240 × 320 × 64 73,792
conv 64 3 × 3 1 LeakyReLU 240 × 320 × 64 36,928
transpose conv 32 3 × 3 2 – 480 × 640 × 32 8,224
concat – – – – 480 × 640 × 64 0
conv 32 3 × 3 1 LeakyReLU 480 × 640 × 32 18,464
conv 32 3 × 3 1 LeakyReLU 480 × 640 × 32 9,248
conv 3 1 × 1 1 Sigmoid 480 × 640 × 3 99
Total 7,760,163

of 2, an Adam optimization scheme [57] with a learning-rate decay,


and an initial learning rate at 10−4 . Specifically, the learning rate is
fixed at 10−4 for the first 100 epochs and gradually reduces after
that. There are three typical gradient descent algorithms available
for neural network optimization: batch gradient descent, stochastic
or online gradient descent, and minibatch gradient descent [55]. The
proposed approach uses the minibatch gradient descent algorithm to
overcome the computation memory limitation, and the small batch size
of 2 provides frequent updates on the network parameters for fast learn-
Fig. 6. Performance comparison of sigmoid and linear activation functions during the
ing [58]. In the network architecture, the LeakyReLU negative slope
training process.
coefficient 𝛼 and the dropout rate are set to 0.1 and 0.2, respectively.
Data augmentation is applied to the training and validation datasets to
improve the performance and accuracy of the network model. 3.2. Quantitative assessment and qualitative evaluation
Several Keras callbacks that can perform actions at various stages of
training are incorporated into the learning process to monitor the learn- The proposed deep learning network aims to extract phase maps
ing rate (LearningRateScheduler), save the best model (ModelCheck- from a single fringe-pattern image, and the phase maps are then em-
point), and save the history plot of the training (History), etc. Particu- ployed by conventional algorithms to perform accurate 3D shape recon-
larly, the model checkpoint only upgrades the saved model whenever struction. Therefore, the first experiment is conducted to directly verify
the capabilities of phase determination and 3D reconstruction of the
a lower validation loss is observed after each epoch. Fig. 6 shows an
proposed fringe-to-phase strategy. In the experiment, three randomly
example of the history plot of the loss function during the training
selected test images are fed into the trained fringe-to-phase network to
process of the fringe-to-phase network with two different last activation
get the predicted wrapped phase outputs. The wrapped phase distribu-
functions: sigmoid and linear functions. It is noticed that the sigmoid tions are then substituted into Eq. (4) to retrieve the unwrapped phase
activation function helps the network converge better than the linear distributions, which are subsequently plugged into Eq. (5) to determine
activation function. The training time of 200 epochs for both cases takes the depth or height map as well as the final 3D point clouds and shape
less than three hours. reconstruction. It is well known that the temporal structured-light 3D

7
H. Nguyen et al. Measurement 190 (2022) 110663

Fig. 7. Phase errors and 3D shape measurements of three test objects obtained from the proposed fringe-to-phase network.

shape measurement techniques outperform the conventional single-


shot ones in terms of measurement accuracy. As mentioned previously,
the primary difference between the two foregoing techniques lies in the
number of inputs. The temporal strategy requires multiple (typically
over 10) images to acquire high-accuracy 3D geometric information,
whereas the single-shot uses a single image to get 3D shape information
at a fast speed. Because the dataset output labels are prepared with the
TFFS phase-shifting technique, which is a specific temporal structured-
light 3D shape measurement technique, it is convenient to compare the
proposed fringe-to-phase technique with the state-of-the-art temporal
structured-light technique without replacing the capturing system or
modifying the system configuration.
Fig. 7 shows some results obtained from the proposed technique
as well as the conventional FPP-based technique. In the figure, the
first column in each row displays the input image of a representative
test object, followed by the phase differences between the unwrapped
phase map determined by the proposed technique and the ground-
truth unwrapped phase map. The last two columns demonstrate the
ground-truth 3D shapes obtained by the FPP technique and the 3D
shapes reconstructed by the proposed technique, respectively. The unit
of the phase errors is fringes (1 fringe = 2𝜋 rad) in the figure. The
phase-difference map indicates that large prediction errors are mainly
located along the edges of abrupt fringe phase change, and the errors
in other regions are quite small. The full-field mean phase error is
normally smaller than 0.0005 fringes or 0.003 rad. For better visual
comparison, the background in the reconstructed 3D image has been
removed from the image. It is evident that the temporal structured-
light technique yields higher accuracy than the proposed single-shot
technique; meanwhile, it can also be seen from the figure that the
accuracy of the fringe-to-phase approach is acceptable. Especially, even
though the surfaces of the 3D shapes reconstructed from the proposed Fig. 8. 3D reconstruction results of the proposed fringe-to-phase technique and the
technique carry certain blurry and noisy effects, some structured details comparative fringe-to-depth method.
are reserved on the reconstructed surfaces.
As mentioned in the introduction section, there are two recent
approaches to integrating the deep learning scheme and the structured- the same input images, and the output labels are the wrapped phase
light technique: fringe-to-depth method and fringe-to-phase method. maps and the further obtained depth maps, respectively. Besides that,
Accordingly, the second experiment is implemented to compare their the network architecture for the fringe-to-depth approach is slightly
performances. The adopted FPP technique described in Section 2.1 can modified to accommodate the input requirement of the depth map, in
prepare the datasets for both approaches, which makes a comparison which the channel of depth is changed from three to one (i.e., 𝑐 = 1).
between them more accessible. The two types of datasets can share Furthermore, the last activation function is switched from a sigmoid

8
H. Nguyen et al. Measurement 190 (2022) 110663

Fig. 9. 3D reconstructions of geometrically separated objects.

Table 2 easily seen from the shadows in the background. The results firmly
Quantitative performance comparison of the fringe-to-phase and fringe-to-depth
show that the proposed approach can handle well with the geometric
networks (unit: mm).
discontinuity issue, which generally cannot be delivered by the conven-
Method fringe-to-phase fringe-to-depth
tional single-shot FPP-based 3D shape measurement and reconstruction
Validation Test Validation Test
techniques.
RMSE 0.1218 0.1524 0.5673 0.6376 The fourth experiment has been conducted to perform an accuracy
Mean 0.0441 0.0721 0.3363 0.3738
comparison between the proposed technique with another state-of-the-
Median 0.0431 0.0728 0.3239 0.3511
Trimean 0.0435 0.0731 0.3310 0.3622 art 3D shape measurement method named 3D digital image correlation
Best 25% 0.0317 0.0545 0.2790 0.2859 (3D-DIC). The 3D-DIC is fundamentally a stereo vision-based technique
Worst 25% 0.0575 0.0872 0.4080 0.4891 that captures the target from two different perspective views with two
cameras. The 3D point clouds of the target can be determined using
a rigorous image correlation or matching algorithm upon knowing
the camera calibration parameters [9,11]. Unlike the traditional image
to a linear activation function since the height or depth values are
matching in the computer vision field, which detects the image dis-
continuous variables. Fig. 8 shows a comparison of 3D reconstruc-
parities only in the horizontal direction after stereo rectification, the
tions of two test samples acquired by the proposed fringe-to-phase
3D-DIC technique detects the image disparities in both horizontal and
approach and the comparative fringe-to-depth network. In the figure,
vertical directions with an accuracy of 0.01 pixels or better. Techni-
the first row exhibits two test inputs; each of the second and third
cally, the 3D-DIC is considered a single-shot structured-light 3D shape
rows presents the 3D ground-truth label, the 3D shape reconstructed
measurement technique due to the synchronous capture action of the
by the fringe-to-phase network, and the result from the fringe-to- two cameras. In the experiment, both cameras of the capturing system
depth network, respectively. It can be observed in Fig. 8 that by using are activated to carry out the 3D-DIC measurement, and the cameras
the phase information as an intermediary before depth and 3D shape are calibrated prior to the experiment. In addition, a randomly gener-
reconstruction, the fringe-to-phase approach is capable of acquiring 3D ated speckle-pattern image is projected on the target to help facilitate
geometric shapes with smaller errors than the fringe-to-depth approach. the DIC image matching process. Fig. 10 illustrates the measurement re-
Table 2 lists six statistical metrics from the depth results obtained sults of the proposed technique and the 3D-DIC technique. The first and
using the fringe-to-phase and fringe-to-depth methods. The metrics second columns display two different types of input images required
comprise root-mean-square error (RMSE), mean absolute error (MAE), by the proposed fringe-to-phase approach and the 3D-DIC method,
median error, trimean error, and means of the best and worst 25%, respectively. It is noted that only the speckle image captured by the
respectively. Both validation and test datasets are included in the table. left camera is shown. It is determinable from the visualization that the
The metrics demonstrate again that the fringe-to-phase approach yields proposed approach yields 3D shape measurement and reconstruction
higher accuracy than the fringe-to-depth one. closer to the ground-truth shape with more structural details than the
The third experiment is accomplished to validate the proposed 3D-DIC technique. Two metrics of RMSE and MAE have been calculated
technique with one of the most challenging problems for a single- for the selected two samples with the ground-truth labels serving as
shot 3D shape measurement and reconstruction technique in practical references. The proposed fringe-to-phase approach gives an RSME of
applications: there are multiple separated objects or geometric discon- 0.1132 and an MAE of 0.0416, while the 3D-DIC method yields lower
tinuities in the region of interest. Traditionally, the single-shot FPP measurement accuracy with an RSME of 0.4783 and an MAE of 0.2852.
3D shape measurement technique cannot cope with the geometric The reason is that the 3D-DIC technique relies on subset-based image
discontinuity problem because the relative fringe orders between two matching, which lacks the ability to provide local details. Remarkably,
separated regions are generally lacking in a single-shot image. The am- the entire 3D reconstruction time of the proposed approach is 16 ms
biguities of fringe orders often directly result in an incorrect unwrapped (3 ms for phase prediction and 13 ms for 3D reconstruction), whereas
phase map. Nonetheless, the proposed fringe-to-phase approach can the 3D-DIC analysis takes 3.5 s.
handle such a geometric discontinuity problem very well because it To demonstrate the capability of the proposed technique for dy-
predicts wrapped phases, and the wrapped phase has no fringe order or namic application, a final experiment has been carried out to recon-
phase ambiguity problem. The subsequent phase-unwrapping algorithm struct the 3D shapes of a small rotating fan. In the experiment, the
associated with Eq. (4) can automatically determine the correct fringe frame rate of the camera is set to 200 frames per second (fps), and
orders to obtain the full-field phase distributions. Two exemplars of 3D the capturing commences after the fan is turned off and slows down
shape measurements and reconstructions of multiple separate objects to a speed of around 250 revolutions per minute (rpm). Such a design
using the proposed network are demonstrated in Fig. 9. The separation ensures that the fan blades in the captured images are clear and not
of the test objects and the discontinuities of fringe orders can be blurred. Fig. 11 exhibits the 3D reconstruction result obtained from an

9
H. Nguyen et al. Measurement 190 (2022) 110663

Fig. 10. Measurement performance comparison of the proposed technique and the 3D-DIC technique.

makes the network more efficient and effective in extracting the phase
maps where the 3D shape features are encoded. Because the fringe-to-
phase network can learn feature representations from distorted fringe
patterns better than plain images, the proposed 3D shape measurement
and reconstruction technique is superior to the conventional stereo
vision methods.
The proposed technique has been compared with two state-of-
the-art multi-shot and single-shot 3D shape measurement techniques,
i.e., the FPP 3D imaging technique and the 3D-DIC technique. Exper-
imental supports have shown that the proposed approach can carry
out the 3D shape measurement with accuracy higher than the 3D-
DIC technique and lower than the FPP technique. Specifically, the
proposed 3D shape measurement and reconstruction technique via
the fringe-to-phase network outperforms the 3D-DIC method in both
speed and accuracy, and it outpaces the FPP technique with respect
to speed. Despite the fact that the measurement accuracy of the pro-
posed approach is subordinate to the conventional FPP 3D imaging
technique, the accuracy gap is not substantial. Particularly, the notable
errors originated from abrupt phase jumps at the edges of geometric
Fig. 11. Dynamic 3D reconstruction of a rotating fan (see Appendix A).
discontinuities can often be easily removed through edge erosion.
In distinction to the fringe-to-depth strategy, the proposed approach
is less efficient due to the additional phase-unwrapping process as
image arbitrarily chosen from the captured image sequence. The four well as the subsequent depth calculation. Nevertheless, its accuracy
sub-figures, following a left to right and top to bottom order in the performance is superior and is worth the extra work, especially con-
figure, are the captured raw image, full-field depth map in color, front sidering that Eqs. (4)–(5) are straight forward and the computation is
view of the 3D plot, and depth map of the fan blades only, respectively. extraordinarily fast.
A video clip of the dynamic 3D reconstruction result is available in During the exploration of the proposed work, a number of deep
Appendix A where the display speed is set to 10 fps (i.e., one-twentieth learning network variations have also be tested with the same raw
of the capturing speed). images captured for dataset generations. For instance, a modified net-
work can be employed to output the unwrapped phase instead of the
4. Discussions and conclusions wrapped phase, and another network can transform the wrapped and
unwrapped phase maps to the 3D depth map. A similar network can
Compared with the conventional structured-light methods for 3D be trained to transform a grayscale single-frequency image into three
shape measurements and reconstructions that may use fringe patterns, wrapped phase maps of three different frequencies. Nevertheless, they
speckle patterns, or other texture patterns, the proposed technique do not yield better performance than the proposed technique and are
works only with fringe patterns, conforming to the current standing therefore not included considering the scope of the paper.
of industrial 3D scanners for achieving high measurement accuracy. In addition to exploring more advanced neural networks and more
However, unlike the conventional fringe projection-based 3D shape computation-efficient training algorithms to enhance the performance
measurement and reconstruction techniques that generally depend on of the proposed 3D shape measurement and reconstruction technique,
using a number of images (typically over 10), the proposed one requires future work will include generating much larger and broader datasets
only a single-shot color image. The key objective is to reduce the with a wide variety of objects at different scales. Despite lacking
capturing time while maintaining reasonably high accuracy. rigorous explanation on the superior behaviors and not having exact
The proposed fringe-to-phase CNN model aims to retrieve three guidance on performance improvement, the technique will reveal itself
wrapped phase maps directly from a single color image comprising more as investigations and explorations proceed further. It is therefore
three fringe patterns with designated frequencies. It does not attempt believed that the proposed single-shot 3D shape measurement and re-
to get unwrapped phase maps directly. This scheme skips the chal- construction technique via fringe-to-phase network will be a promising
lenging task of identifying fringe orders and instead leaves it to the tool for countless scientific research and engineering applications in the
tri-frequency phase unwrapping algorithm to handle. Such handling future.

10
H. Nguyen et al. Measurement 190 (2022) 110663

CRediT authorship contribution statement [14] Z. Zhao, P. Zheng, S. Xu, X. Wu, Object detection with deep learning: A review,
IEEE Trans. Neural Netw. Learn. Syst. 30 (2019) 3212–3232, http://dx.doi.org/
10.1109/icABCD49160.2020.9183866.
Hieu Nguyen: Conceptualization, Methodology, Software, Formal [15] B. Lin, S. Fu, C. Zhang, F. Wang, Y. Li, Optical fringe patterns filtering based on
analysis, Data curation, Visualization, Writing - original draft. Erin multi-stage convolution neural network, Opt. Lasers Eng. 126 (2020) 105853,
Novak: Methodology, Validation, Writing - editing. Zhaoyang Wang: http://dx.doi.org/10.1016/j.optlaseng.2019.105853.
Conceptualization, Software, Supervision, Writing - reviewing & edit- [16] K. Wang, Y. Li, Q. Kemao, J. Di, J. Zhao, One-step robust deep learning phase
unwrapping, Opt. Express 27 (2019) 15100–15115, http://dx.doi.org/10.1364/
ing.
OE.27.015100.
[17] K. Yan, Y. Yu, C. Huang, L. Sui, K. Qian, A. Asundi, Fringe pattern denoising
Declaration of competing interest based on deep learning, Opt. Commun. 437 (2019) 148–152, http://dx.doi.org/
10.1016/j.optcom.2018.12.058.
[18] J. Zhang, X. Tian, J. Shao, H. Luo, R. Liang, Phase unwrapping in optical
The authors declare that they have no known competing finan- metrology via denoised and convolutional segmentation networks, Opt. Express
cial interests or personal relationships that could have appeared to 27 (2019) 14903–14912, http://dx.doi.org/10.1364/OE.27.014903.
influence the work reported in this paper. [19] I. Laina, C. Rupprecht, V. Belagiannis, F. Tombari, N. Navab, Deeper depth pre-
diction with fully convolutional residual networks, in: International Conference
on 3D Vision, 3DV, 2016, pp. 239–248, http://dx.doi.org/10.1109/3DV.2016.32.
Acknowledgment [20] D. Eigen, C. Puhrsch, R. Fergus, Depth map prediction from a single image using
a multi-scale deep network, in: International Conference on Neural Information
This work utilized the computational resources of the NIH HPC Processing Systems, 2014, pp. 2366–2374, http://dx.doi.org/10.5555/2969033.
2969091.
Biowulf cluster. (http://hpc.nih.gov)
[21] F. Liu, C. Shen, G. Lin, Deep convolutional neural fields for depth estimation from
a single image, in: IEEE Conference on Computer Vision and Pattern Recognition,
Funding CVPR, 2015, pp. 5162–5170, http://dx.doi.org/10.1109/CVPR.2015.7299152.
[22] N. Wang, Y. Zhang, Z. Li, Y. Fu, W. Liu, Y. Jiang, Pixel2Mesh: Generating 3D
mesh models from single RGB images, in: Computer Vision – ECCV, 2018, pp.
This research did not receive any specific grant from funding agen- 55–71, http://dx.doi.org/10.1007/978-3-030-01252-6_4.
cies in the public, commercial, or not-for-profit sectors. [23] Y. Cao, Z. Liu, Z. Kuang, L. Kobbelt, S. Hu, Learning to reconstruct high-quality
3D shapes with cascaded fully convolutional networks, in: Computer Vision –
ECCV, 2018, pp. 616–633, http://dx.doi.org/10.1007/978-3-030-01240-3_38.
Appendix A. Supplementary data
[24] R. Furukawa, D. Miyazaki, M. Baba, S. Hiura, H. Kawasaki, Robust structured
light system against subsurface scattering effects achieved by CNN-based pattern
Supplementary material related to this article can be found online detection and decoding algorithm, in: European Conference on Computer Vision,
at https://doi.org/10.1016/j.measurement.2021.110663. ECCV Workshops, 2018, http://dx.doi.org/10.1007/978-3-030-11009-3_22.
[25] S. Fanello, C. Rhemann, V. Tankovich, A. Kowdle, S. Escolano, D. Kim, S.
Izadi, Hyperdepth: Learning depth from structured light without matching, in:
References IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 2016, pp.
5441–5450, http://dx.doi.org/10.1109/CVPR.2016.587.
[1] A. iProov Bud, Facing the future: The impact of apple faceid, Biomed. Technol. [26] H. Wang, J. Yang, W. Liang, X. Tong, Deep single-view 3D object reconstruction
Today 2018 (2018) 5–7, http://dx.doi.org/10.1016/S0969-4765(18)30010-9. with visual hull embedding, in: AAAI Conference on Artificial Intelligence, 2019,
[2] H. Gonzalez-Jorge, B. Riveiro, E. Vazquez-Fernandez, J. Martínez-Sánchez, P. pp. 8941–8948, http://dx.doi.org/10.1609/AAAI.V33I01.33018941.
Arias, Metrological evaluation of microsoft kinect and asus xtion sensors, [27] H. Nguyen, K. Ly, T. Tran, Y. Wang, Z. Wang, Hnet: Single-shot 3D shape
Measurement 46 (2013) 1800–1806, http://dx.doi.org/10.1016/j.measurement. reconstruction using structured light and h-shaped global guidance network, Res.
2013.01.011. Opt. 4 (2021) 100104, http://dx.doi.org/10.1016/j.rio.2021.100104.
[3] H. Nguyen, Z. Wang, P. Jones, B. Zhao, 3D shape, deformation, and vibration [28] H. Nguyen, Y. Wang, Z. Wang, Single-shot 3D shape reconstruction using
measurements using infrared kinect sensors and digital image correlation, Appl. structured light and deep convolutional neural networks, Sensors 20 (2020)
Opt. 56 (2017) 9030–9037, http://dx.doi.org/10.1364/AO.56.009030. 3718, http://dx.doi.org/10.3390/s20133718.
[29] P. Yao, S. Gai, F. Da, Coding-Net: A multi-purpose neural network for fringe
[4] Z. Zhang, Microsoft kinect sensor and its effect, IEEE Multimedia 19 (2012) 4–10,
projection profilometry, Opt. Commun. 489 (2021) 126887, http://dx.doi.org/
http://dx.doi.org/10.1109/MMUL.2012.24.
10.1016/j.optcom.2021.126887.
[5] J. Geng, Structured-light 3D surface imaging: A tutorial, Adv. Opt. Photon. 3
[30] Y. Zheng, S. Wang, Q. Li, B. Li, Fringe projection profilometry by conducting
(2011) 128–160, http://dx.doi.org/10.1364/AOP.3.000128.
deep learning from its digital twin, Opt. Express 28 (2020) 36568–36583,
[6] J. Salvi, J. Pages, J. Batle, Pattern codification strategies in structured light
http://dx.doi.org/10.1364/OE.410428.
systems, Pattern Recognit. 37 (2004) 827–849, http://dx.doi.org/10.1016/j.
[31] S. Jeught, J. Dirckx, Deep neural networks for single shot structured light
patcog.2003.10.002.
profilometry, Opt. Express 27 (2019) 17091–17101, http://dx.doi.org/10.1364/
[7] S. Zhang, High-speed 3D shape measurement with structured light methods:
OE.27.017091.
A review, Opt. Lasers Eng. 106 (2018) 119–131, http://dx.doi.org/10.1016/j.
[32] H. Nguyen, T. Tran, Y. Wang, Z. Wang, Three-dimensional shape reconstruction
optlaseng.2018.02.017.
from single-shot speckle image using deep convolutional neural networks, Opt.
[8] F. Bruno, G. Bianco, M. Muzzupappa, S. Barone, A. Razionale, Experimentation
Lasers Eng. 143 (2021) 106639, http://dx.doi.org/10.1016/j.optlaseng.2021.
of structured light and stereo vision for underwater 3D reconstruction, ISPRS
106639.
J. Photogramm. Remote Sens. 66 (2011) 508–518, http://dx.doi.org/10.1016/j. [33] F. Wang, C. Wang, Q. Guan, Single-shot fringe projection profilometry based
isprsjprs.2011.02.009. on deep learning and computer graphics, Opt. Express 29 (2021) 8024–8040,
[9] H. Kieu, T. Pan, Z. Wang, M. Le, H. Nguyen, M. Vo, Accurate 3D shape http://dx.doi.org/10.1364/OE.418430.
measurement of multiple separate objects with stereo vision, Meas. Sci. Technol. [34] H. Nguyen, D. Nicole, H. Li, Y. Wang, Z. Wang, Real-time 3D shape measurement
25 (2014) 035401, http://dx.doi.org/10.1088/0957-0233/25/3/035401. using 3LCD projection and deep machine learning, Adapt. Learn. Optim. 58
[10] S. Matrinez, E. Cuesta, J. Barreuri, B. Alvarez, Analysis of laser scanning and (2019) 7100–7109, http://dx.doi.org/10.1364/AO.58.007100.
strategies for dimensional and geometrical control, Int. J. Adv. Manuf. Technol. [35] W. Yin, Q. Chen, S. Feng, T. Tao, L. Huang, M. Trusiak, A. Asundi, C. Zuo,
46 (2010) 621–629, http://dx.doi.org/10.1007/s00170-009-2106-8. Temporal phase unwrapping using deep learning, Sci. Rep. 9 (2019) 20175,
[11] Z. Wang, H. Kieu, H. Nguyen, M. Le, Digital image correlation in experimental http://dx.doi.org/10.1038/s41598-019-56222-3.
mechanics and image registration in computer vision: Similarities, differences [36] J. Shi, X. Zhu, H. Wang, L. Song, Q. Guo, Label enhanced and patch based deep
and complements, Opt. Lasers Eng. 65 (2015) 18–27, http://dx.doi.org/10.1016/ learning for phase retrieval from single frame fringe pattern in fringe projection
j.optlaseng.2014.04.002. 3D measurement, Opt. Express 27 (2019) 28929–28943, http://dx.doi.org/10.
[12] M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. 1364/OE.27.028929.
Franke, S. Roth, S. Bernt, The cityscapes dataset for semantic urban scene [37] R. Machineni, G.E. Spoorthi, K. Vengala, S. Gorthi, R. Gorthi, End-to-end deep
understanding, in: IEEE Conference on Computer Vision and Pattern Recognition, learning-based fringe projection framework for 3D profiling of objects, Comput.
CVPR, 2016, pp. 3213–3223, http://dx.doi.org/10.1109/CVPR.2016.350. Vis. Image Underst. 199 (2020) 103023, http://dx.doi.org/10.1016/j.cviu.2020.
[13] O. Ronneberger, P. Fischer, T. Brox, U-Net: Convolutional networks for biomed- 103023.
ical image segmentation, in: Medical Image Computing and Computer-Assisted [38] G. Qiao, Y. Huang, Y. Song, H. Yue, Y. Liu, A single-shot phase retrieval method
Intervention, 2015, pp. 234–241, http://dx.doi.org/10.1007/978-3-319-24574- for phase measuring deflectometry based on deep learning, Opt. Commun. 476
4_28. (2020) 12630, http://dx.doi.org/10.1016/j.optcom.2020.126303.

11
H. Nguyen et al. Measurement 190 (2022) 110663

[39] J. Qian, S. Feng, T. Tao, Y. Hu, Q. Chen, C. Zuo, Deep-learning-enabled [49] H. Le, H. Nguyen, Z. Wang, J. Opfermann, S. Leonard, A. Krieger, J. Kang,
geometric constraints and phase unwrapping for single-shot absolute 3D shape Demonstration of a laparoscopic structured-illumination three-dimensional imag-
measurement, APL Photonics 5 (2020) 046105, http://dx.doi.org/10.1063/5. ing system for guiding reconstructive bowel anastomosis, J. Biomed. Opt. 23
0003217. (2018) 056009, http://dx.doi.org/10.1117/1.JBO.23.5.056009.
[40] J. Liang, J. Zhang, J. Shao, B. Song, B. Yao, R. Liang, Deep convolutional neural [50] H. Nguyen, D. Nguyen, Z. Wang, H. Kieu, M. Le, Real-time, high-accuracy 3D
network phase unwrapping for fringe projection 3D imaging, Sensors 20 (2020) imaging and shape measurement, Appl. Opt. 54 (2015) A9–A17, http://dx.doi.
3691, http://dx.doi.org/10.3390/s20133691. org/10.1364/AO.54.0000A9.
[41] S. Feng, Q. Chen, G. G., T. Tao, L. Zhang, Y. Hu, W. Yin, C. Zuo, Fringe [51] H. Nguyen, J. Liang, Y. Wang, Z. Wang, Accuracy assessment of fringe projection
pattern analysis using deep learning, Adv. Photonics 1 (2019) 025001, http: profilometry and digital image correlation techniques for three-dimensional
//dx.doi.org/10.1117/1.AP.1.2.025001. shape measurements, J. Phys. Photonics 3 (2021) 014004, http://dx.doi.org/10.
[42] P. Yao, S. Gai, Y. Chen, W. Chen, F. Da, A multi-code 3D measurement technique 1088/2515-7647/abcbe4.
based on deep learning, Opt. Lasers Eng. 143 (2021) 106623, http://dx.doi.org/ [52] Z. Wang, D. Nguyen, J. Barnes, Some practical considerations in fringe projection
10.1016/j.optlaseng.2021.106623. profilometry, Opt. Lasers Eng. 48 (2010) 218–225, http://dx.doi.org/10.1016/j.
[43] W. Li, J. Yu, S. Gai, F. Da, Absolute phase retrieval for a single-shot fringe optlaseng.2009.06.005.
projection profilometry based on deep learning, Opt. Eng. 60 (6) (2021) 064104, [53] H. Du, Z. Wang, Three-dimensional shape measurement with an arbitrarily
http://dx.doi.org/10.1117/1.OE.60.6.064104. arranged fringe projection profilometry system, Opt. Lett. 32 (2007) 2438–2440,
[44] W. Hu, H. Miao, K. Yan, Y. Fu, A fringe phase extraction method based on neural http://dx.doi.org/10.1364/OL.32.002438.
network, Sensors 21 (2021) 1664, http://dx.doi.org/10.3390/s21051664. [54] M. Vo, Z. Wang, B. Pan, T. Pan, Hyper-accurate flexible calibration technique
[45] P.S. Huang, Q. Hu, F. Jin, F.-P. Chiang, Color-encoded digital fringe projection for fringe-projection-based three-dimensional imaging, Opt. Express 20 (2012)
technique for high-speed 3-D surface contouring, Opt. Eng. 38 (6) (1999) http://dx.doi.org/10.1364/OE.20.016926, 16926–16941.
1065–1071, http://dx.doi.org/10.1117/1.602151. [55] I. Goodfellow, Y. Bengio, A. Courville, Deep Learning, The MIT Press, Cambridge,
[46] S. Feng, C. Zuo, W. Yin, G. Gu, Q. Chen, Micro deep learning profilometry Massachusetts, 2016.
for high-speed 3D surface imaging, Opt. Lasers Eng. 121 (2019) 416–427, [56] A. Mass, A. Hannun, A. Ng, Rectifier nonlinearities improve neural network
http://dx.doi.org/10.1016/j.optlaseng.2019.04.020. acoustic models, in: Proc. International Conference on Machine Learning, ICML,
[47] J. Qian, S. Feng, Y. Li, T. Tao, J. Han, Q. Chen, C. Zuo, Single-shot abso- vol. 28, 2013, p. 1.
lute 3D shape measurement with deep-learning-based color fringe projection [57] D. Kingma, J. Ba, A method for stochastic optimization, in: International
profilometry, Opt. Lett. 45 (2020) 1842–1845, http://dx.doi.org/10.1364/OL. Conference on Learning Representations, ICLR, 2015, p. 15, arXiv:1412.6980.
388994. [58] D. Wilson, T. Martinez, The general inefficiency of batch training for gradient
[48] H. Nguyen, Z. Wang, Accurate 3D shape reconstruction from single structured- descent learning, Neural Netw. (2003) 16, http://dx.doi.org/10.1016/S0893-
light image via fringe-to-fringe network, Photonics 8 (2021) 459, http://dx.doi. 6080(03)00138-2.
org/10.3390/photonics8110459.

12

You might also like