Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

2019 19th International Symposium on Communications and Information Technologies

(ISCIT)

A Review of Stereo-Photogrammetry Method


for 3-D Reconstruction in Computer Vision
Phuong Ngoc Binh Do1,2, Quoc Chi Nguyen1

1Department of Mechatronics, Ho Chi Minh City University of Technology, Ho Chi Minh City, Vietnam
2 Department of Information Technology, Saigon Institute of Technology, Ho Chi Minh City, Vietnam

Abstract—3-D reconstruction is one big concern since the


first days of computer vision (CV) development. By going
through the history of 3-D reconstruction, we show the
contribution of this field into society and the necessity of
improving the system’s complexity, speed and accuracy.
Although various algorithms and methods have been
introduced, stereo-photogrammetry that applies the binocular
vision principle is the closest to human vision system. Originated
from the motivation of improving the stereo-photogrammetry
and applying for robot manipulation, the method is reviewed
with all three known-to-date categorizes local, global and semi- Fig. 1. A rough timeline of some of the most active
global. In addition, it is advisable to rectify the stereo images topics of research in computer vision [1, pp. 10].
before matching to find depth and reconstruct 3-D models.
Therefore, some of the rectification methods are demonstrated.
Finally, we discussed our findings and future work relates to
stereo-photogrammetry.

Keywords—stereo-photogramery, stereo matching, 3-D


reconstruction, computer vision, rectification, disparity.
(a) (b) (c)
I. INTRODUCTION
Fig. 2. Applications of CV in (a) medical, (b)
Computer vision (CV) is a study to extract the useful archeology and (c) robotics [12, pp. 2–3].
information from digital photos and videos automatically by
computer, which mimics the vision system of a human. CV is As mentioned above, the early researches in the 3-D vision
challenging because it solves the inverse matter [1, pp. 3], system has supported tremendous applications of visual
which demands combination knowledge of many fields such impaired, robotics, navigation, games industry, augmented
as Mathematics and Physics. Szeliski [1, pp. 10] through reality (AR) and virtual reality (VR) [13]. Although
Figure 1 shows a rough timeline for the most concerned topics improvements in 2-D image processing algorithms such as
in CV. There are various concerns in CV such as Image line detection, image segmentations, feature-based object
Formation; Image Processing; Feature Detection and detection, and global optimizations offer more reliable 3-D
Matching; Image Segmentation and 3-D reconstruction [1, pp. recovering techniques, they are still far from re-produce
5]. However, this paper focuses on stereo-photogrammetry as natural visual perception system of a two-year-old human [1,
a method for 3-D reconstruction. Historically, 3-D pp. 3]. [16-17] classified the techniques into Passive and
reconstruction has been developing along with CV decades Active, as shown in Figure 3.
ago. In the 1880s, according to [1, pp. 12], many CV
algorithms that focused on processing three-dimensional (3-
D) data such as modeling [2], acquisition [3], merging [4], and
recognition [5] were introduced. One of the highlights is the
introduction of discrete Markov Random Field (MRF) models
for optimization in image global search, which was applied to
find the disparity maps and acquire images’ depth information
in the later years [7-9]. Later in the 1990s, tracking and image Fig. 3. 3-D image reconstruction techniques [16].
segmentation techniques were significantly improved [10-11],
which led to multi-view stereo algorithms in 3-D Active techniques contact or project certain form of energy
reconstruction [1, pp. 14]. In the 2000s, more efficient (lights, laser, or ultrasound) onto an object to measure distance
algorithms for complex global optimization contributed to from a sensor to that object, in which the set of distances
application of 3D processing into other fields such as medical, collected from the sensor are employed to reconstruct
archeology and robotics [1, pp. 16]. Figure 2 shows 3-D geometric data of the object. The sensors used in the these
reconstruction algorithms are used to (a) process MRI images techniques are mostly based on the Time-of-Flight (ToF)
and diagnose diseases, (b) accurately estimate ice floes’ age Principle, where the distance can be extracted from the return
and size from images, and (c) reconstructing the 3-D time of reflect waves or the phase shift between the
environment where mobile robots are working. Recently, illumination and reflection, as shown in Figure 4.
machine learning techniques have shown effectiveness for the
applications of Artificial Intelligence (AI) into CV technology
[1] [12-15]. For instance, [15] made use of the 3-D vision and
machine learning to train the robots grasp objects (which
previously unseen) at specific points.

978-1-7281-5009-3/19/$31.00
Authorized licensed use limited to: UNIVERSITAETSBIBLIOTHEK 138
©2019 IEEE CHEMNITZ. Downloaded on January 30,2022 at 20:17:58 UTC from IEEE Xplore. Restrictions apply.
2019 19th International Symposium on Communications and Information Technologies
(ISCIT)

companies such as Microsoft, Sony and Facebook have been


investing in the research and development of VR sets a
research trend in this field. Binocular vision makes use of two
cameras (that is why so-called “stereo”) simulating the
mechanism of the human eyes to reproduce this missing
information. Yan et al. [23] shows two possible arrangements
for the stereo as shown in Figure 5.
Fig. 4. The principle of ToF depth camera [18].
Passive techniques use an image sensor (semiconductor
charge-coupled devices or metal-oxide-semiconductor) to
record the reflected/emitted radiance from the surfaces of
objects. The signals collected from image sensors are
converted to the digital data, which is employed to build the
3D images of the objects. It should be noted that the digital
data represents the projection of the real image to the plane of
image sensor, i.e., the 2-D image of the 3-D real image. Figure
5 shows the principle to reconstruct the 3-D image from 2-D
stereo images. Accuracy is one of the highlighted advantages (a) (b)
of these methods. However, in outdoor or poorly controlled
environments, the reflection of light may generate noise in the Fig. 5. Cameras arrangements for stereo-
photogrammetry:(a) Converged camera axes
2-D images and yield the poor results from image processing.
(b) Parallel camera axes [ 23].
A common solution to eliminate the noise is hardware
improvement (e.g. image sensor, optical lens, and signal A. Stereo-Photogrammetry Depth Formula
processing chip), but results in high cost. Another solution is
to develop improved algorithms that employ commercial The parallel setup is favorable for depth calculation; and
cameras. One of these methods is called photogrammetry and the process to convert the other arrangements into the parallel
has been focused by numerous researches for 50 years [13]. one is called rectification. According to [12, pp. 290–291], by
following the geometry of binocular arrangement in Figure 6,
With machine learning, photogrammetry, as an image- the formula to calculate the depth parameter can be deduced
based 3-D reconstruction method, enables the use of as following:
enormous number of online visual information. In addition,
photogrammetry is the suitable answer for robots’ visual
perception in terms of system mobility, complexity, cost and
performance. For example, medical has benefited from most
applications of photogrammetry in analyzing MRI and CT
data [19] and robotics industry using 3-D visual system for
environment detection [12, pp. 2], grasping [20] or welding
[21]. Photogrammetry, therefore, is heavily studied and
continuously improved. One of the photogrammetry system is
the use of an extra light source (typically a projector or a laser)
with a single camera to increase the accuracy but perform
ineffective outdoors and under poorly controlled Fig. 6. Geometry of binocular arrangement
surroundings. Another system uses two cameras, therefore
[12, pp. 290].
called stereo-photogrammetry, is the focus of this paper
because of its continuous improvement even in the In parallel axes setup, arbitrary point P in the scene of the
complicated radiometric conditions between images [22]. As two cameras will be projected on both cameras’ image plane -
an alternative, a moving camera are used to capture two or and , those projected points are called a conjugate pair
more images for depth calculation, but would take more (or corresponding points). The plane formed by the center
processing time than stereo camera. Section II presents the points of the cameras’ lenses and P is an epipolar plane. Then,
stereo-photogrammetry method. For details, Section II-A the intersections of this plane with the two image planes are
illustrates the basic formula for depth calculation with stereo- called epipolar lines. Clearly, a conjugate pair of an arbitrary
photogrammetry, in which the stereo images need to be point P lies on the corresponding epipolar lines in each image.
rectified and used for disparities search. Rectification process However, there will be a displacement in the position of the
is then discussed in Section II-B. Finally, the stereo-matching corresponding points, = − which is called disparity.
algorithms to find the disparity maps are reviewed in section Assuming the origin of the coordinate system is at the
II-C. In the conclusion, we briefly introduce our current and center of the left camera lens. Without the loss of generality,
future work on this method. applying the Thales theorem for the triangle and a
line that is parallel to the side of the triangle, we have
II. STEREO-PHOTOGRAMMETRY
The key concept is the binocular vision. In biology, it is a type x xL (1)
 .
of vision using two eyes to perceive the depth by overlapping z f
the field of view from each eye. In recent years, there has
been massive application of this concept to technology, Similarly, for the triangle and the line
namely Virtual Reality (VR). The fact that big technology

139
Authorized licensed use limited to: UNIVERSITAETSBIBLIOTHEK CHEMNITZ. Downloaded on January 30,2022 at 20:17:58 UTC from IEEE Xplore. Restrictions apply.
2019 19th International Symposium on Communications and Information Technologies
(ISCIT)

x  b xR (2) C. Stereo Matching


 .
z f Disparity searching, a.k.a. stereo matching or visual
correspondence, is a process of finding the corresponding
Finally, the formula for depth is deduced from the two pairs in the stereo images and obtaining the disparity maps
equations above from their positions. This matter has been intensively
bf bf (3) investigated in CV community [28-29]. The number of the
z  .
xL  xR D methods are still increasing and there is an online platform
called Middlebury benchmark to keep track of the new
The formula raises two sub-issues that are needed to be methods. The evaluations of the methods are often done
solved: 1. How to rectify the images, and 2. How to find the between the achieved disparity maps with the ground truth
disparity D map from a pair of rectified stereo images. images (i.e. the actual depth images). Currently, there has not
yet a method that yields the most reliable and fastest result,
B. Image Rectification
this opens the gate for researches in improving its performance
and speed.
Scharstein et al. [29] generalized the process into four
steps: 1. Matching cost computation, 2. Cost (support)
aggregation, 3. Disparity computation/Optimization, and 4.
Disparity refinement; while Fuhr et al. [28] divided the
(a) (b) methods into two categories: local and global. However, both
Fig. 7. Stereo Images (a) before (b) after rectification. local or global approaches will go through the above steps,
which are called the taxonomy of stereo matching.
The conjugate (corresponding) pair of an arbitrary point
on the object will lie on the corresponding pair of epipolar Local technique uses the surround information around a
lines in the stereo images. However, in Figure 7(a), it is a pixel to calculate its disparity locally. By applying the
tremendous task to find the corresponding epipolar lines in the taxonomy, in the first and second step, common algorithms
whole image. Rectification process helps to transform the are Squared intensity differences (SSD), Absolute intensity
images so that the corresponding epipolar lines will be parallel differences (SAD), Normalized cross-correlation and Binary
and on the same horizontal lines, Figure 7(b). The disparity matching cost [28]. Overall, the algorithm that represents this
searching problem therefore reduces from the whole-image step and mostly used is SAD. For example, [30] used this
searching into line searching. technique to calculate the matching cost for their mobile robot.
The formula for SAD is known as below.
There are cases that rectification is impossible, i.e., of
pushroom images [22]. The epipolar lines are not straight but
hyperbola. The author [22] suggested the method to aggregate
SAD ( x, y, D)  
( i , j )W ( x , y )
| I L (i, j )  I R (i  D, j ) | . (4)
the matching cost from 16 directions to cover the 2-D image.
However, most of the cases it is feasible and advisable to In Eq. (4), W ( x, y) is the surround window with the
rectify the stereo images to reduce the calculating complexity. concerned pixel in the middle; x, y are therefore odd integers.
Literature offers various methods for image rectification. I (i, j ) is the intensity of a pixel at the coordinate (i, j ) ; is
[25] names a few such as Robert et al.’s [26] algorithm the disparity between the concerned pixels. Figure 9 gives a
minimizes the distortion of orthogonality around centers of the visual example of SAD matching cost [30]. The matching cost
photos; Hartley [27] minimizes the parallax within the is the absolute difference (AD) between the pixels (that are
corresponding points which are used for the rectification converted into grayscale) in the searching windows from the
process, which is then applied for the stereoRectify- stereo images, and the aggregation step is to take the sum of
Uncalibrated() library in OpenCV; and Loop and Zhang [24], the differences, so call the name SAD. Finally, the two
proposed to decompose the transformation into three smaller concern pixels with the least SAD (of the windows around
steps: Projective Transformation, Similarity Transformation them) will be the corresponding ones (Using “Winner-take-
and Shearing Transformation, then minimize the distortion of all” (WTA) method).
each transformation. The final distortion varies from methods
to methods. However, generally all the algorithms try to
rectify the stereo images with the least distortion that makes ABS ( - ) =
the epipolar lines parallel and on the same horizontal lines.
Figure 8 shows the testing results for using Hartley’s [27] and
Zhang’s [24] method, the former shows some distortion
SAD = 1696
comparing to the latter method.
Fig. 9. SAD cost aggregation example [30].
The advantage of this technique is low computation cost
comparing to the global methods which try to optimize a
global energy, hence, the local techniques typically faster.
However, the results are noisier [28]. The local algorithm can
(a) (b) be improved by minimizing an image row (i.e. a pair of
epipolar lines in post rectification) instead of the local cost for
Fig. 8. Hartley’s (a) shows some distortion in red circle a pixel. A disparity space image (DSI) is created from each
while Zhang’s (b) does not pair of corresponding epipolar lines, then using Dynamic
Programming to search for the optimal path from top left to

140
Authorized licensed use limited to: UNIVERSITAETSBIBLIOTHEK CHEMNITZ. Downloaded on January 30,2022 at 20:17:58 UTC from IEEE Xplore. Restrictions apply.
2019 19th International Symposium on Communications and Information Technologies
(ISCIT)

bottom right of the DSI, which as a result, is the disparity map Lr ( p, D)  C ( p, D)  min( Lr ( p  r , D),
of the points on that scanline [31]. However, this method
introduces streaking because the relationships between DSIs Lr ( p  r, D  1)  P1 ,
(6)
of the whole image are not considered. Figure 10 show a DSI Lr ( p  r, D  1)  P1,
example.
min Lr ( p  r,i)  P2 )  min Lr ( p  r ,k).
i k

S ( p, D)   Lr ( p, D). (7)
r

In Figure 11, paths that are not horizontal, vertical or


diagonal are implemented by moving one step horizontally (or
vertically), then one step diagonally. In Eq. (7), the aggregated
(a) (b) cost S ( p, D) are the sum of Lr from all directions r, then the
Fig. 10. (a) DSI of a scanline, and (b) Regularize disparity D is selected with respect to the minimum cost,
process of the DSI [31]. min DS( p, D) . If Di is the number of possible disparities, the
calculation cost is O ( Di ) at each pixel. The algorithm also
In contrast, global techniques normally skip the aggregation
goes through each pixel 16 times, therefore, total complexity
step (step 2 of the taxonomy) and introduce an energy-
optimizing framework that contains a data and a smoothness is O (WHDi ) . In [22], the author applied integer-based SIMD
term [22][29]. By adding penalties for occlusions (where parts (Single Instruction, Multiple Data) assembler instructions for
of an object can be viewed in left image but blocked from right parallel processing, hence, reduce significantly the time.
image, and vice versa) [32-33], visibility treatment [24-36], Moreover, [22] used the mutual information (MI) based on the
ensuring the stereo consistent symmetry [34-37], or using entropies of the images to calculate the matching cost; which
segmentation data for smoothness term weighting [36], the results are proved to be insensitive to intensity changes [49],
smoothness of the disparity map is enhanced. First of all, the see Figure 12.
energy function is defined as below
E(D)  E(data) (D)  E( smooth) ( D). (5)

In Eq. (5), E(data) ( D) is the sum of initial or aggregated


matching costs that shows to what extent the disparity function
D agrees with the pair of stereo images. And the E( smooth) ( D) Fig. 11. Aggregation of costs:
term makes sure the disparity map is robust by investigating 16 Paths from all directions r [22].
the difference of neighboring disparities. Then, many
approaches are used to optimize the energy function, e.g. Finally, depending on the applications that the disparity
traditional algorithms with regularization and Markov refinement will be carried out or not. Normally, disparities
Random Fields (MRF) such as continuation [38], simulated estimated in discretized space are good enough for robot
annealing [39-41], highest confidence first [42], mean field navigation or object/ human tracking. However, they are not
annealing [43]; or recently, graph-cut [44-48] and Belief appealing in image-based rendering, e.g. quantized maps. As
Propagation [34]. Despite most top-ranked methods are global a result, sub-pixels refinement techniques repeating gradient
which produce smooth and robust results, their disadvantage descent and curve-fitting to the matching cost at discrete
is the computation cost; in other words, local techniques tend disparity levels [50-52]. However, the results depend on two
to be more simple and faster than global ones [22][28]. conditions: 1. Smoothly varied intensities, 2. Computing
regions are on same surface [29]. [53] mentioned the
Semi-global matching method, therefore, was introduced possibility of correlation curves fitting to integer-sampled
to combine the low complexity of local methods (by matching costs, but the conditions for good results are need to
calculating matching cost in 1D) with the global smoothness be investigated further [29].
(by aggregating matching cost from all directions equally, i.e.,
at least 8 and up to 16 directions to well-cover the 2-D images)
[22]. This idea also solves cases where rectification is not
feasible. First the costs are calculated using any known (a) (b)
method, e.g. SAD. The aggregated cost S ( p, D) for a pixel p
(with disparity D) is the sum of all 1-D optimum cost paths
that end in pixel p (at disparity D), see Figure 11. The method
is formulated, as below, for cost Lr ( p,D) belongs to a path in
direction r of pixel p (with disparity D). Finally, the author (c) (d)
subtracts the minimum path cost of previous pixel,
min Lr ( p  r, k) to avoid the accumulation of costs along the
k
path. Fig. 12. (a) Left image (b) Right image, synthetically
altered (c) Result from a traditional stereo algorithm
(d) Result from MI method [49].

141
Authorized licensed use limited to: UNIVERSITAETSBIBLIOTHEK CHEMNITZ. Downloaded on January 30,2022 at 20:17:58 UTC from IEEE Xplore. Restrictions apply.
2019 19th International Symposium on Communications and Information Technologies
(ISCIT)

III. CONCLUSIONS [8] R. Szeliski, and S. B. Kang, “Recovering 3D Shape and Motion from
Image Streams Using Nonlinear Least Squares,” in Journal of Visual
Middlebury benchmark, being mentioned earlier, is a Communication and Image Representation, vol. 5, no. 1, 1994, pp. 10–
reliable online platform to keep track of the continuous 28.
developments in stereo-matching. The newest method is [9] A. Azarbayejani and A. P. Pentland, Recursive Estimation of Motion,
Structure, and Focal Length, in IEEE Transactions on Pattern
dated to 2019 (the same publishing year of this paper) Analysis and Machine Intelligence, vol. 17, no. 6, June 1995, pp. 562–
showing that this is currently an active research topic. 575.
This review shows the possibility of stereo-photogrametry [10] S. M. Seitz, and C. M. Dyer, “Photorealistic Scene Reconstruction by
to build the 3D vision systems. We realized that in local Voxel Coloring,” in International Journal of Computer Vision, vol. 35,
techniques such as SAD, the matching costs are redundantly no. 2, 1999, pp. 151–173.
calculated many times, and it is unnecessary to search for the [11] K. N. Kutulakos, and S. M. Seitz, “A Theory of Shape by Space
depth of objects’ background. In addition, researches show Carving,” in International Journal of Computer Vision, vol. 38, no. 3,
2000, pp. 199–218.
that global approaches will take more computation cost than
[12] R. Jain, R. Kasturi, and B. G. Schunck, “Introduction,” in Machine
local methods. Although there is an OpenCV library for semi- Vision. New York, USA: McGraw-Hill, 1995.
global method StereoSGBM() and based on Hirschmuller’s
[13] M. Siudak and P. Rokita, “A Survey of Passive 3D Reconstruction
work [22], it does not make use of the Mutual Information Methods on the Basis of more than One Image,” in Machine Graphics
(MI) to calculate the matching cost (which, as being proved and Vision, vol. 23, no. 3, 2014, pp. 57–117.
by the author, yields good matching results even under [14] X. Li and Y. Shi, Computer Vision Imaging Based on Artificial
changing intensities). Therefore, the investigation on the Intelligence, in International Conference on Virtual Reality and
accuracy and performance of the stereo-photogrametry Intelligent Systems (ICVRIS), Changsha, 2018, pp. 22–25.
method should be continued. [15] R. F. Peter, M. Lucas, T. Russ, “Dense Object Nets: Learning Dense
Visual Object Descriptors by and for Robotic Manipulation,” in
Some stereo-photogrametry methods were reviewed in Proceedings of The 2nd Conference on Robot Learning, PMLR, vol.
this paper are dated a decade ago. However, those are the 87, 2018, pp. 373–385.
fundamental knowledge that has led to many state-of-the-art [16] R. Khilar, S. Chitrakala and S. SelvamParvathy, 3D Image
algorithms. For example, in 2012, Sah Jotwani showed an Reconstruction: Techniques, Applications and Challenges, in
International Conference on Optical Imaging Sensor and Security
improved stereo matching method which depended on (ICOSS), Coimbatore, 2013, pp. 1–6.
correlation method [54]. Or in [55], published in 2018, a new [17] B. Julius, G. Iñigo, G. C. Luis and F. E. Carlos, “3D Reconstruction
method was introduced and called Cyclops 2 that minimized Methods, a Survey,” in Proceedings of the First International
a weight function by applying the idea of calculating SAD Conference on Computer Vision Theory and Applications, 2006, pp.
(which method was discussed earlier in this paper). Even in a 457–463.
few-month-old paper (Mar. 2019), [56], the authors still using [18] S. Lee, Depth Camera Image Processing and Applications, in 19th
dynamic programming in order to calculate the disparity. IEEE International Conference on Image Processing, Orlando, FL,
2012, pp. 545–548.
Overall, most of the new stereo-matching methods are
[19] A. Tsai, J.W. Fisher, C. Wible, W.M. Wells, J. Kim, and A.S. Willsky,
improving the calculation time because stereo-matching is “Analysis of Functional MRI Data Using Mutual Information,” in
“computationally intensive” [55]. One of the known solutions, Taylor C., Colchester A. (eds) Medical Image Computing and
parallel computation with the graphics processing unit (GPU) Computer-Assisted Intervention – MICCAI’99, MICCAI 1999. Lecture
technology, is also our focus. Some other current researches Notes in Computer Science, Springer, Berlin, Heidelberg, vol 1679,
focus on improving the performance of the stereo-matching doi: 10.1007/10704282_51.
by applying convolutional neural networks (CNN), such as [20] Z. Kowalczuk and D. Wesierski, “Vision Guided Robot Gripping
Systems,” in Automation and Robotics. Juan Manuel Ramos Arreguin,
MC-CNN methods that uses CNN to compute matching cost Rijeka: IntechOpen, 2008, pp. 41–72. doi: 10.5772/6264.
but still using Semi-Global Matching (SGM-which was first
[21] L. Pérez, Í. Rodríguez, N. Rodríguez, R. Usamentiaga, and D. F.
introduced in 2008 [22]) to perform the matching [55]. García, “Robot Guidance Using Machine Vision Techniques in
However, those topics are not in the scope of this paper. Industrial Environments: A Comparative Review,” in Sensors (Basel),
vol. 16, no. 3, Mar. 2016, pp. 335–361.
REFERENCES [22] H. Hirschmuller, Stereo Processing by Semiglobal Matching and
[1] R. Szeliski, “Introduction,” in Computer Vision: Algorithms and Mutual Information, in IEEE Transactions on Pattern Analysis and
Applications. England: Springer, 2011. Machine Intelligence, vol. 30, no. 2, Feb. 2008, pp. 328–341.
[2] G. J. Agin and T. O. Binford, Computer Description of Curved [23] L. Yan, X. Zhao and H. Du, Research on 3D Measuring Based
Objects, in IEEE Transactions on Computers, vol. C-25, no. 4, April Binocular Vision, in IEEE International Conference on Control
1976, pp. 439–449. Science and Systems Engineering, Yantai, 2014, pp. 18–22.
[3] O. D. Faugeras and M. Hebert, “The Representation, Recognition and [24] C. Loop and Z. Zhang, “Computing Rectifying Homographies for
Positioning of 3-D Shapes from Range Data,” in Kanade, T. (ed.), Stereo Vision,” in Computer Vision and Pattern Recognition, vol. 1,
Three-Dimensional Machine Vision, Boston, MA, USA: Kluwer 1999, pp. 1125–1131.
Academic Publishers, 1987, pp. 301–353. [25] V. Nozick, “Multiple View Image Rectification,” in 1st IEEE-
[4] B. Curless and M. Levoy, “A Volumetric Method for Building International Symposium on Access Spaces (IEEE-ISAS’11), Japan,
Complex Models from Range Images,” in ACM SIGGRAPH June 2011, pp.277–282.
Conference Proceedings, New Orleans, 1996, pp. 303–312. [26] L. Robert, C. Zeller, O. Faugeras, and M. Hebert, “Applications of
[5] P. J. Besl and R. C. Jain, “Three-Dimensional Object Recognition,” in Non-Metric Vision to some Visually Guided Robotics Tasks,” in
Computing Surveys, vol. 17, no. 1, 1985 pp. 75–145. INRIA, Tech. Rep. RR-2584, June 1995.
[6] A. Banno, T. Masuda, T. Oishi and K. Ikeuchi, “Flying Laser Range [27] R.I. Hartley, “Theory and Practice of Projective Rectification,” in
Sensor for Large Scale Site Modeling and its Applications in Bayon International Journal of Computer Vision, vol. 35, 1999, pp. 115–127.
Digital Archival Project,” in International Journal of Computer Vision, [28] G. Fuhr, G. P. Fickel, L. P. Dal’Aqua, C. R. Jung, T. Malzbender, and
vol. 78, no. 2–3, 2008, pp. 207–222. R. Samadani, “An Evaluation of Stereo Matching Methods for View
[7] C. J. Taylor, D. J. Kriegman and P. Anandan, Structure and Motion in Interpolation,” in IEEE International Conference on Image
Two Dimensions from Multiple Images: a Least Squares Approach, Processing, 2013, pp. 403–407.
in Proceedings of the IEEE Workshop on Visual Motion, Princeton, NJ, [29] D. Scharstein, R. Szeliski, R. Zabih, “A Taxonomy and Evaluation
USA, 1991, pp. 242–248. of Dense Two-Frame Stereo Correspondence Algorithms,” in

142
Authorized licensed use limited to: UNIVERSITAETSBIBLIOTHEK CHEMNITZ. Downloaded on January 30,2022 at 20:17:58 UTC from IEEE Xplore. Restrictions apply.
2019 19th International Symposium on Communications and Information Technologies
(ISCIT)

Proceedings IEEE Workshop on Stereo and Multi-Baseline Vision [51] L. Matthies, R. Szeliski, and T. Kanade, “Kalman Filter-Based
(SMBV 2001), pp. 131–140. Algorithms for Estimating Depth from Image Sequences,” in
[30] Tri. Priyambodo, “Grid-Edge-Depth Map Building Employing SAD International Journal of Computer Vision, vol. 3, 1989, pp. 209–236.
with Sobel Edge Detector,” in International Journal on Smart Sensing [52] Q. Tian and M. N. Huhns, “Algorithms for Subpixel Registration,” in
and Intelligent System, vol. 10, no. 13, Sep. 2017, pp. 551–566. Computer Vision, Graphics, and Image Processing, vol. 35, 1986, pp.
[31] C.-H. Kim, H.-K. Lee, and Y.-H. Ha, Disparity Space Image-Based 220–233.
Stereo Matching Using Optimal Path Searching, in Proc. SPIE 5022, [53] M. Shimizu and M. Okutomi, “Precise Sub-Pixel Estimation on Area-
Image and Video Communications and Processing, May 2003, pp. Based Matching,” in Proceedings Eighth IEEE International
752–760. Conference on Computer Vision— ICCV 2001, vol. 1, 2001, pp. 90–
[32] M. Bleyer and M. Gelautz, “A Layered Stereo Matching Algorithm 97.
Using Image Segmentation and Global Visibility Constraints,” in [54] S. Sah and N. Jotwani, “Stereo Matching using Multi-resolution
ISPRS J. Photogrammetry and Remote Sensing, vol. 59, no. 3, 2005, Images on CUDA,” in International Journal of Computer
pp. 128–150. Applications, vol. 56, no. 12, 2012, pp. 47–55.
[33] V. Kolmogorov and R. Zabih, “Computing Visual Correspondence [55] A. Ivanavičius, H. Simonavičius, J. Gelšvartas, A. Lauraitis, R.
with Occlusions Using Graph Cuts,” in Proc. Int’l Conf. Computer Maskeliūnas, P. Cimmperman and P. Serafinavičius, “Real-time
Vision, vol. 2, 2001, pp. 508–515. CUDA-based stereo matching using Cyclops2 algorithm,” in
[34] Q. Yang, L. Wang, R. Yang, H. Stewenius, and D. Nister, “Stereo EURASIP Journal on Image and Video Processing, 2018. DOI:
Matching with Color-Weighted Correlation, Hierarchical Belief /10.1186/s13640-018-0253-2
Propagation and Occlusion Handling,” in Proc. IEEE Conf. Computer [56] M. Hallek, F. Smach and M. Atri, “Real-time stereo matching on
Vision and Pattern Recognition, June 2006, pp. 492–504. CUDA using Fourier descriptors and dynamic programming,” in
[35] C. Lei, J. Selzer, and Y.-H. Yang, “Region-Tree Based Stereo Using Computational Visual Media, vol. 5, no. 1, Mar. 2019, pp. 59–71.
Dynamic Programming Optimization,” in Proc. IEEE Conf. Computer
Vision and Pattern Recognition, June 2006.
[36] J. Sun, Y. Li, S. Kang, and H.-Y. Shum, “Symmetric Stereo Matching
for Occlusion Handling,” in Proc. IEEE Conf. Computer Vision and
Pattern Recognition, vol. 2, June 2005, pp. 399–406.
[37] C. L. Zitnick, S. B. Kang, M. Uyttendaele, S. Winder, and R. Szeliski,
“High-Quality Video View Interpolation Using a Layered
Representation,” in Proc. ACM SIGGRAPH ’04, vol. 23, no. 3, 2004,
pp. 600–608.
[38] A. Blake and A. Zisserman, Visual Reconstruction. London, England:
The MIT Press, 1987.
[39] S. T. Barnard, “Stochastic Stereo Matching over Scale,” in IJCV, vol.
3, no. 1, 1989, pp. 17–32.
[40] S. Geman and D. Geman, “Stochastic relaxation, Gibbs distribution,
and the Bayesian restoration of images,” in IEEE Transactions on
Pattern Analysis and Machine Intelligence, vol. PAMI-6, no. 6, 1984,
pp. 721–741.
[41] J. Marroquin, S. Mitter, and T. Poggio, “Probabilistic Solution of Ill-
Posed Problems in Computational Vision,” in J. Am. Stat. Assoc., vol.
82, no. 397, 1987, pp. 76–89.
[42] P. B. Chou and C. M. Brown, “The Theory And Practice Of Bayesian
Image Labeling,” in International Journal of Computer Vision, vol. 4,
no. 3, 1990, pp. 185–210.
[43] D. Geiger and F. Girosi, “Mean Field Theory for Surface
Reconstruction,” in IEEE Transactions on Pattern Analysis and
Machine Intelligence, vol. 13, no. 5, 1991, pp. 617–630.
[44] Y. Boykov, O. Veksler, and R. Zabih, “Fast Approximate Energy
Minimization via Graph Cuts,” in IEEE Transactions on Pattern
Analysis and Machine Intelligence, vol. 23, no. 11, 2001, pp. 1222–
1239.
[45] H. Ishika and D. Geiger, “Occlusions, Discontinuities, and Epipolar
Lines in Stereo,” in Burkhardt H., Neumann B. (eds) Computer Vision
— ECCV 98, Springer, Berlin, Heidelberg, vol. 1406, 1998, pp. 232–
248.
[46] V. Kolmogorov and R. Zabih, “Computing Visual Correspondence
with Occlusions Using Graph Cuts,” in Proceedings Eighth IEEE
International Conference on Computer Vision. ICCV 2001, vol. 2,
2001, pp. 508–515.
[47] S. Roy and I. J. Cox, “A Maximum-Flow Formulation of the N-Camera
Stereo Correspondence Problem,” in Sixth International Conference on
Computer Vision (IEEE Cat. No.98CH36271), 1998, pp. 492–499.
[48] O. Veksler, “Efficient Graph-based Energy Minimization Methods in
Computer Vision,” PhD thesis, Cornell University, USA, 1999.
[49] J. Kim, V. Kolmogorov and R. Zabih, Visual Correspondence Using
Energy Minimization and Mutual Information, in Proceedings Ninth
IEEE International Conference on Computer Vision, Nice, France,
vol.2, 2003, pp. 1033–1040.
[50] T. Kanade and M. Okutomi, “A Stereo Matching Algorithm with an
Adaptive Window: Theory and Experiment,” in IEEE Transactions on
Pattern Analysis and Machine Intelligence, vol. 16, no. 9, 1994, pp.
920–932.

143
Authorized licensed use limited to: UNIVERSITAETSBIBLIOTHEK CHEMNITZ. Downloaded on January 30,2022 at 20:17:58 UTC from IEEE Xplore. Restrictions apply.

You might also like