Automated High Precision Optical Tracking of Flying Objects: Master Thesis

Automated high precision optical tracking
of flying objects
Master Thesis
Patrick Neff
Master Curriculum Geomatics Engineering
Spring Semester 2016
Supervisors:
Prof. Dr. Konrad Schindler
Prof. Dr. Alain Geiger
Advisers:
Katrin Lasinger
Dr. Sébastien Guillaume
Institute of Geodesy and Photogrammetry

Swiss Federal Institute of Technology Zurich
Date of Submission: July 4th 2016

i
Acknowledgements
Several people supported me during the development of this project. I would like to express
my gratitude for their contribution. I would especially like to thank my supervisors Prof. Dr.
Konrad Schindler and Prof. Dr. Alain Geiger for the opportunity to work on this fasci-
nating project and contribute to it.
I also want to thank my advisers Katrin Lasinger and Dr. Sébastien Guillaume for their
support provided during the whole work on my thesis.
iii
iv
Abstract
QDaedalus is a measurement system developed at ETHZ by Bürki et al. (2010). Since then it
has been constantly developed. Recently the system is used to track aircraft during the flight,
namely landing and take off. To be able to automatically track the aircraft the locations of the
aircraft within the frame was determined and then used to update the viewing direction of the
theodolite. The current method for detecting the aircraft fails from time to time, especially
for the last part of the landing approach. To enhance the object detection and tracking a new
method has been developed within the scope of this master thesis. The new method uses the
optical flow information between two frames in order to detect the object. The vectors are
then segmented into object and background. Lastly, the position of the object is filtered with a
Kalman filter to achieve a higher resistance against outliers and false detections. Even though
the developed method could no yet be implemented into the QDaedalus’ tracking software, tests
with the new method show that the object detection and tracking can be enhanced. The results
clearly indicate, that with the right set of parameters the aircraft can be detected in real-time.
It also works under circumstances the previous method failed. The results even suggest that the
tracking also works for any moving object, for example a downhill skier.
v
Contents
1 Introduction 1
2 Theoretical Foundations 3
2.1 Object Detection and Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1.1 Edge Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1.2 Corner Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1.3 On-line Boosting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1.4 Discriminatively Trained Deformable Part Model (DPM) . . . . . . . . . 4
2.1.5 Tracking-Learning-Detection (TLD) . . . . . . . . . . . . . . . . . . . . . 4
2.2 Optical Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2.1 Lucas-Kanade Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2.2 Horn-Schunck Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2.3 Region-Based Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 Discrete Kalman Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3 Method 7
3.1 Current Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.2 New Method for Object Detection and Tracking . . . . . . . . . . . . . . . . . . 8
3.2.1 Gridsize . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.2.2 Corner Detection to Calculate OF . . . . . . . . . . . . . . . . . . . . . . 8
3.2.3 Calculate Optical Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.2.4 Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Segmentation based on Absolute Values . . . . . . . . . . . . . . . . . . . 11
Segmentation based on Angles . . . . . . . . . . . . . . . . . . . . . . . . 13
3.2.5 Kalman Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Determining the Variances for each Image . . . . . . . . . . . . . . . . . . 16
4 Results & Discussion 19

4.1 Comparison to previous Method . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.2 Results with Test Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.2.1 Influence of Grid Size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.2.2 Influence of Corner Threshold . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.2.3 Camera Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.3 Time cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.4 Tracking Different Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5 Conclusion & Outlook 31
References 33
List of Figures 35
vii
viii
1 Introduction
QDaedalus is a measurement system developed by Bürki et al. (2010) for automated astrogeodetic
measurements. The system consists of a theodolite and a camera sensor. The camera replaces the
eyepiece of the theodolite. This allows taking camera images with very accurate corresponding
angle measurements. Hence making it possible to calculate the 3D-position of an object in
space, provided that two systems with known positions and orientation measure the same object.
QDaedalus has been constantly updated and expanded. Conzett (2014) used it to track aircraft
instead of objects in space. Nüssli and Salzgeber (2015) then refined the process, which allows a
single operator to control a station and fallow the aircraft automatically. This leads to a tracking
system, based on QDaedalus, which is able to track aircraft and determine their position in post
processing.
An interesting application for this tracking system is to track aircraft during their final landing
approach. In this phase of the flight, precise positioning is crucial. To verify aerial positioning
systems, such as GNSS or ILS for an aircraft, it is essential to have an independent control
system with the same or preferably higher accuracy. The final landing approach however, is
where most problems occur with the current tracking algorithm implemented by Nüssli and
Salzgeber (2015). They use an edge detection algorithm. This yields good results as long as the
background is homogeneous, but as soon as the background is more edged, the algorithm loses
track of the object. This makes it necessary to have a more robust tracking algorithm, one that
also operates with a more complex background.
A possible solution is to use the optical flow information of two consecutive frames to track
the object. Optical flow is an intuitive solution to the heeded problem, because the desired
object is moving by its very nature. The main challenge is to have a slim and fast algorithm,
because the system works at a frequency of 20 Hertz. This meaning an image is taken every
0.05 seconds, then processed, and the information of the object is in a last sept used to relocate
the theodolite. As data to test a new method the various datasets acquired during multiple
measurement campaigns are available. Because these datasets are recorded with the previous
detection algorithm, there is no data for the case when the method fails. In order to also have
data, when the previous method fails, a few tracks are recorded with manual tracking.
Out of the requirements the goal of this thesis is derived. The goal is to have a robust object
detection and tracking algorithm, which would allow to track the flying object regardless of the
background. This under the precondition of being able to run the algorithm at a frequency of 20
Hertz.
In the following thesis it is elaborated how the set goal has been achieved. First a overview of
the necessary theoretical foundations is provided. Next is a detailed description of the newly
developed algorithm, followed by a chapter about the results of the algorithm and lastly the
conclusion and outlook for further research is discussed.
1
2
2 Theoretical Foundations
In this chapter the theoretical foundations needed to understand the new detection and tracking
algorithm are explained. In addition to the subjects used in later chapters, there are also a few
additional topics explained to give more theoretical context to this thesis.
2.1 Object Detection and Tracking

To reliably detect and track the flying object is the core task of this thesis. There are several
methods and approaches to solve this task. In the following sections a few of them are introduced.
2.1.1 Edge Detection

A very simple approach to detect an object is edge detection. The precondition for this to work
is that the object has highest edge count within the image. What might seem to be far fetched
is a reasonable idea, considering that flying objects, especially aircraft, are often before the sky
only. If the sky is clear, it means the background of the object is very homogeneous without any
edges.
There are several ways to detect corners. For example the Sobel operator developed by Sobel
and Feldman (Sobel, 2014) uses two 3x3 convolution matrices as seen in Equations 2.1 and 2.2.
The result of the convolution is an image containing the edges (Szeliski, 2010). Another, a bit
more sophisticated approach is the the Canny algorithm. For this algorithm a smoothing mask is
used on the image before detecting edges in order to prevent noise. In addition to the smoothing,
after a pixel is detected as an edge, a non-maximum suppression is carried out. This thins out
the edges to a thickness of only one pixel. Also a pixel is only added to an edge if either they
have a large edge strength or they are neighbour to an already detected edge. The results are
thin edges that are free from noise. (Canny, 1986)
   
−1 0 1 1 2 1
Gx = −2 0 2 (2.1) Gy =  0 0 0 (2.2)
   
−1 0 1 −1 −2 −1
2.1.2 Corner Detection

With the same consideration as with the edge detection it makes sense to detect corners in order
to detect the object. Again this only makes sense if the object holds the most corners. This is
rather seldom the case, but for applications with flying objects still reasonable. A possible way to
detect corners is using minimal eigenvalues of the Hessian matrix. The Hessian matrix contains
the sum of second derivatives in both directions for a predefined window size. (Mikolajczik et
al., 2005)
δ2I δ2I
 
 Σ δx2 Σ δxδy 
H=  δ2I
 (2.3)
δ2I 
Σ Σ 2
δxδy δy
3
There are many other corner detection algorithm. For example a vary similar approach is Harris
corner detection (Harris, 1988) or other approaches as SIFT (Lowe, 1999) or SURF (Bay et al.,
2006).
2.1.3 On-line Boosting
The basic idea of real-time tracking via on-line Boosting is to consider the the tracking problem
as a binary classification problem between background and the object. The classifiers are then
constantly updated to best track the target object. The on-line boosting approach requires an
initialisation with a positive image sample of the object. By knowing the positive sample, a
set of negative samples is also produced from the same image. These sets of samples is used
in a next step to iteratively obtain a first model of the classifier. With this first model the
initialization is done. The current classifier is then evaluated for each point in a defined region
of interest and each position gets a confidence value. The confidence map is analysed and the
target is shifted to the new location. With the new object detected the classifier is updated, to
catch slow changes in appearance of the object. To generate the classifiers three different type of
features are used: Haar-like features, orientation histograms and local binary patterns. (Graber
et al., 2006)
2.1.4 Discriminatively Trained Deformable Part Model (DPM)
The discriminatively trained DPM is built on a pictoral structures framework. This means
an object is represented by a collection of parts, whereas the configuration of those parts is
deformable. These deformable parts are then characterized by spring-like connections between
certain pairs of parts. The problem concerning this approach is the learning part. This is tackled
with a new method for discriminative training with partially labeled data. A margin sensitive
approach for data-mining hard negative examples is combined with a so called latent SVM.
(Felzenszwalb et al., 2010)
2.1.5 Tracking-Learning-Detection (TLD)
TLD is a framework for long-term tracking (tracking an object in a video stream at frame rate
for indefinitely long) that decomposes the task into three sub task. Namely, as the name of the
framework suggests, in tracking, learning and detection. The tracker follows the object through
all frames, the detector localizes the occurrence of the object in a frame and corrects the tracker
if necessary, and the learning part estimates the errors of the detector. The errors are then used
to update the detector to avoid these errors in future frames. (Kalal et al., 2010)
2.2 Optical Flow

Optical flow is a general motion estimation approach to detect motion on a pixel-level. This
generally includes minimizing the brightness or colour difference between two corresponding
pixels in subsequent frames summed over the image. This problem is under-constrained and
4
to solve this, two classic approaches evolved: perform the summation locally or introduce
smoothness terms and search for a global minimum. (Szeliski, 2010)
2.2.1 Lucas-Kanade Method
The Lucas-Kanade algorithm (Lucas and Kanade, 1981) is a sparse approach and therefore
solves the under-constrained problem with the summation locally. Information is only derived
from a small window surrounding the pixel of interest. The Lucas-Kanade algorithm is based on
three different assumptions:
• Brightness consistency It is assumed that a pixel does not change its brightness from frame
to frame even if it changes its position.
• Temporal persistence A surface patch slowly changes its image motion over time. This
means the object does not move far from one frame to another.
• Spatial coherence Neighbouring pixels belong to the same surface. This means they move
in a similar manor and project to nearby points on the image plane.
The first two assumptions lead to the equation 2.4. But this equation is yet under-constrained
and that is where the third assumption comes in. If it is assumed that neighbouring points
have the same flow, additional equations can be added. Usually not just one neighbouring pixel
is taken, but rather the whole neighbourhood. For example a 7x7 neighbourhood leads to 49
equations and it would then look as shown in equation 2.5. This over-constrained problem is
then solved with a least squares adjustment.
   
Ix (p1 ) Iy (p1 ) It (p1
" # 
 Ix (p2 ) Iy (p2 )  u  It (p1 
 
Ix u + Iy v + It = 0 (2.4)
 .
 . ..   = . 
 (2.5)
. . v  .. 

 
Ix (p49 ) Iy (p49 ) It (p1 )
This window should now be chosen as big as possible to also catch large motions. But by
choosing a large window coherent motion assumption is more likely to be broken. To cope with
this problem an image pyramid is used. The larger spatial scales are solved first and the thus
derived motion velocity is assumed for lower levels until the raw image pixel is reached. (Bradski
and Kaehler, 2008)
2.2.2 Horn-Schunck Method
The Horn-Schunck algorithm (Horn and Schunck, 1981) uses a smoothness constraint on the
velocity in x and y direction. It also uses the previously stated brightness consistency over the
image. In equation α is a constant weighting coefficient. Only using α as seen in Equation 2.7 is
a rather simple approach, that penalises region with magnitude changing flow.
δ δvx 1
− Ix (Ix vx + Iy vy + It ) = 0 (2.6)
δx δx α
δ δvy 1
− Iy (Ix vx + Iy vy + It ) = 0 (2.7)
δy δy α
5
(Bradski and Kaehler, 2008)
2.2.3 Region-Based Matching

The previously described methods are both differential techniques that compute velocity from
spatio-temporal derivatives of image intensity. This numerical differentiation might not work
perfectly for example due to noise, a small number of frames, or aliasing in the image acquisition.
For a region-based method the velocity is defined as the shift that best fits image regions at
different times. (Barron, 1994)
2.3 Discrete Kalman Filter

The Kalman filter is a filter technique developed by Rudolf Emil Kalman already in 1960. The
filter is still widely used as the standard tracking solution. Its purpose is to continuously estimate
the state of a system according to its physical behaviour. The linear dynamic process, or system
model, in the Kalman filter is defined as follows:
xk = Φk−1 xk−1 + Γk−1 wk−1 wk ∼ N (0, Qk ) (2.8)
xk is the the state of the system at the given time k. And it is calculated with the system
transition matrix or propagator Φk−1 multiplied with the previous state vector xk−1 . To this the
error sources are added. For the Kalman filter this is additive Gaussian noise of known variances,
this means wk−1 and vk are zero-mean Gaussian sequences of the given covariance-matrices Qk
and Rk The measurement model is defined as follows:
zk = Hk xk + vk vk ∼ N (0, Rk ) (2.9)
zk is a vector containing the observations or measurements and as function of the state xk defined.
Hk is called the measurement sensitivity matrix or observation matrix and is determined by
how the observations are affected by the state. The Kalman filter consists of two stages: the
prediction and the update. For the prediction the a posteriori knowledge of the previous time
(+) (+) (−)
epoch xk−1 and Pk−1 are used to calculate the a priori knowledge of the current time epoch xk
(−) (+) (+)
and Pk . The update calculates the a posteriori states of the current epoch xk and Pk as a
linear combination of the a priori knowledge and the new measurements. For this the weighting
matrix Kk , the Kalman gain, is used. (Van Gool et al.,2010; the notation is adopted from Geiger
et al.,2013)
6
3 Method
This chapter will firstly discuss the currently used method to detect and track an aircraft with
the QDaedalus tracking software. It then provides a detailed description of functionality of the
newly developed tracking algorithm. The new algorithm uses optical flow (OF) for the object
detection and a Kalman filter to enhance the object tracking from one frame to the next.
3.1 Current Procedures

Currently the used approach for the automated tracking of aircraft is based on edge detection
by Nüssli and Salzgeber (2015). In a first step they use the Sobel operator to detect edges. The
image is then divided into small blocks and the number of pixels detected as an edge are counted
within each block. Regions are formed in this new image, condensing the blocks into pixels with
the number of edges as value. All neighbouring pixels with a non zero value, this means they
contain edges, are composed to a region. If the background of the aircraft is homogeneous there
would be only one region as a results of the above mentioned steps, namely the aircraft itself. To
cope with background containing many edges the following features of the regions are analysed
to make the detection of the aircraft more robust.
• Maximum intensity: the region with the highest count of edges

• Extent: Ratio between area of the region and the area of the bounding box surrounding
said region
• Eccentricity: Gives a measure how circular shaped the ellipse fitted into the region is
(a) Image with falsly detected object (b) Detected regions
Figure 3.1 The yellow dot is where the edge detection algorithm detects the aircraft
With these features a crude distinction between background regions and the aircraft can be
accomplished. This technique allows to detect and track the aircraft even if the background is
7
not completely homogeneous. However, if the aircraft is in front of a background containing
strong and numerous edges the algorithm is likely to fail. Figure 3.1 illustrates the problem with
the current method accurately.
3.2 New Method for Object Detection and Tracking

In the previous section it was pointed out that in some situations the current method fails.
To enhance the object tracking, OF is a good additional characteristic of the moving object
to use for the detection of the object. Other methods exists, such as deformable part model
(Kalal et al, 2010) or tracking-learning-detection (Felzenszwalb et al, 2010). Unlike any other
method presented in chapter 2.1 The new method does not need any prior information. Neither
is a library with potential parts of the desired object necessary, nor is an initial position with
a bounding box. The only precondition for the approach using OF is that the object moves
relative to the background. This condition is fulfilled because of the nature of the set task
to track moving objects. However, it is not enough to just calculate the OF of two frames to
determine the object. In this thesis the calculated flow vectors are segmented into object and
background. To have a more robust tracking of the object the results from the segmentation
are then filtered with a Kalman filter. The code for this algorithm is written in MATLAB and
C++ using the OpenCV library. The detection of good pixels to track and the calculation is
done in C++ and the rest in MATLAB. A detailed schema of the whole algorithm for detecting
and tracking objects and which coding language is used for which part is shown in Figure 3.2.
Summing up, the main components are calculating the OF, segmenting the results into object
and background, and using a Kalman filter to smooth out the results.
3.2.1 Gridsize
A straight forward way would be to calculate the OF for every single pixel in the image. Because
of the QDaedalus’ cameras resolution of 768x768, this would mean to calculate the OF for a
total number of 589824 pixels. Since the algorithm should work in real time (20 Hertz), the 3
seconds it takes to calculate the OF would be too long. The solution to this time problem is
using only a subset of the provided pixels. For good results concerning the calculation of the OF
it would be preferable to select only good pixels to track for further calculations. However, to
test each pixel for its utility to calculate OF takes too long as well. Therefore, an even grid of
pixels is chosen. Empirical testing reveals the optimum grid size to achieve good results with
small data cost lies at 32x32. How different grid size behave on the quality of results is shown in
chapter 4.2.1.
3.2.2 Corner Detection to Calculate OF
In order to calculate a robust OF the pixels have to be chosen carefully. From the above mentioned
pixels inside the grid not all of them are suitable to calculate the OF. In this algorithm minimal
eigenvalues of the Hessian matrix (see Chapter 2.1.2) are used to detect corners and therefore
good points to calculate the OF. The OpenCV function cvMinEigenVal is used in C++. The
8
Figure 3.2 The flowgraph of the implemented algorithm to detect and track objects. The blue boxes are
calculated in C++ and the white ones in MATLAB
9
parameter for the window size is set to 31 Pixel and the threshold to distinguish between good
an bad corners is set to 0.0003. For this thesis it has not been tested which corner detection
method works best for the set task. In a few cases the minimal eigenvalue fails and it could
therefore be useful to implement other corner detection methods. When and why the minimal
eigenvalue method fails is shown in Chapter 4.2.2
Figure 3.3 shows the image and the overlaid grid with the results of the search for good pixels to
track in red. As long as the background is homogeneous these pixels correspond with the object
pixels. This is due to the fact that only the object holds corners. Therefore, this information is
used in the Kalman filter as observations (see Chapter 3.2.5). As soon as there are also corners
detected in the background the results for the object provided by the corner detection are no
longer accurate. To cope with this result, the covariances are adjusted to give the false detected
object less weight (details on the adjusted variances in chapter 3.2.5).
Figure 3.3 Shows the overlaid grid with red dots representing corners detected by minimal eigenvalues.
On the left an image with homogeneous background; on the right a heterogeneous background
3.2.3 Calculate Optical Flow

The OF is only calculated for the detected corners. For this the Lucas-Kanade method is used.
As described in Chapter 2.2.1 the Lucas-Kanade method uses only a local window and does not
smooth over the whole image. It is therefore not a dense approach, the pixels for which the flow
should be calculated can be selected. Without this sparse approach it would not be possible to
calculate the OF only at good pixels to track. This means the Lucas-Kanade Method allows
the algorithm to be faster because the flow is only calculated for a subset of pixels, and also
more robust because it is only calculated at distinguished pixels. Because of these reasons the
flow is not calculated with one of the other possible OF calculation methods. In the code the
openCV function cvCalcOpticalFlowPyrLK is used. This function needs as inputs of course the
two consecutive frames and the grid-pixels at which the flow should be calculated. Also the
window size is required and set to 21 pixels. In order to catch also bigger displacements pyramids
10
are formed and one of the input parameter for the function. In this algorithm 5 pyramid levels
are calculated. The result is a vector containing the new positions of the starting grid-pixels.
The flow vectors are then calculated with the difference in x and y direction of the grid-pixels
with the new positions. All in this manner obtained flow vectors are then used for segmentation.
3.2.4 Segmentation
The OF vectors are split into their two components: The absolute value and the direction of each
vector. The values of those two components are then segmented into object and background.
Segmentation based on Absolute Values
Because the camera in the QDaedalus system is mounted on a tachymeter, exact angular
measurements exist. The difference of one arc-second equates to roughly 4 pixels in the image
(Equations 3.1 and 3.2 ).
∆Vertical Angle ∗ 3600

OF in x direction = (3.1)
4
∆Horizontal Angle ∗ 3600
OF in y direction = (3.2)
4
This converted angular velocity should have the same length as the flow of the background pixels
but facing the opposite direction. The sum of those two vectors should therefore be zero. This
means that all background pixels should have an absolute value of zero or at least close to zero.
Figure 3.4 shows once the flow of the camera subtracted and once the raw OF. There is however
a problem with the angular readings of the tachymeter. Because the angles can only be read at a
frequency of 10 Hertz it is not possible to synchronise the readings with the camera sensor. This
means, the angles have to interpolated at the time at which the camera takes a picture. A linear
interpolation is done for the angular readings. Because the camera motion is not necessarily
linear the angular velocity between two frames (and therefore the discrete derivative) might
be corrupted. Results of the camera flow compared to the ground truth flow are discussed in
Chapter 4.2.3. In the now proposed algorithm nothing is done to cope with these possibly faulty
angular readings. If the segmentation is wrong because of this, the general outlier detection
should catch the fault.
Looking at the histogram of the absolute values (figure 3.5) it shows two peaks. One is close to
zero and the other one is in this case at 16 pixels. It is safe to assume that this configuration
is typical if there is a moving object in the image. This means for segmentation the pixels
with values around the peak with lower edges are background, and pixels around the peak with
higher edges is the object. To make the distinction between object and background more robust,
outliers have to be detected and removed. The first outliers are detected on the basis of the
histogram (Figure 3.5). The outliers are the values too far a way from the peaks. The maximum
distance from a peak is the grid size divided by 16. And also if only one peak is found in the
histogram, the variances for this observation are set to 999 because the detection is most likely
false. Secondly regions too small are detected and removed as outliers. The minimal region size
is calculated by dividing the grid size by 7 and then rounding up to the next integer. For a grid
size of 32x32 this equates to a minimal region size of 5 pixels. In order to do this, the results of
11
Figure 3.4 Reduced OF: On the left the camera flow is subtracted, vectors of the aircraft point show
the heading of the aircraft and vectors of the background are close to 0; on the right the camera flow is
not subtracted and shows the raw OF components
Figure 3.5 Histogram of the flow vectors absolute values: Shows a peak at 1.5 pixels (background) and
one at 16 pixels (object)
12
the segmentation (the positive pixels, which indicate the object) are examined. It is tested how
many different regions exist and the regions smaller than the previous explained threshold, are
deleted. To fill small gaps between two detected regions, the regions are dilated by two pixels
and then eroded again by two pixels. Also the variances for the Kalman filter are depended on
properties of the regions detected as object. Further details on the calculation of variances in
chapter 3.2.5.
Segmentation based on Angles
The principle for the segmentation of the angular component of the flow is basically the same as
for the absolute component. The main difference is that the camera flow is not subtracted. The
subtraction does not work best here because the background flows with close to zero absolutes is
prone to noise. Small inaccuracies lead to high variability in the directions of the vectors. The
direction however is also segmented and two peaks are expected (Figure 3.6). But instead of
determining the peak with higher edges as object, the peak with the edges further away from
the direction of the camera flow is taken. The outlier detection in the segmentation for the
Figure 3.6 Histogram of the flow vectors directions: Shows a peak at 2.5 radiant (object) and one at
4.75 radiant (background)
direction is the same as in the segmentation for the absolute values: Points too far away from
the histogram peaks and regions too small are removed.
13
3.2.5 Kalman Filter
The Kalman filter is used to smooth the result and is also able to adjust the impact of the
three different observations according to their accuracy. For this the discrete Kalman filter as
described in chapter 2.3 is used. This means the filter is defined as shown in Equations 3.3 and 3.4.
xk = Φk−1 xk−1 + Γk−1 wk−1 wk ∼ N (0, Qk ) (3.3)
zk = Hk xk + vk vk ∼ N (0, Rk ) (3.4)
In total this Kalman filter uses twelve observations: two x and y coordinates for absolute values,
direction, and corners respectively. And for each of the three components two observations for
the length (l) and width (w) of the bounding box (Equation 3.5). All entries in the observation
vector are measured in pixels
The state vector of the filter consists of six elements: two x and y coordinates of the object,
acceleration ẋ and ẏ of the object, and length and width of the bounding box (Equation 3.6).
The position as well as the dimensions of the bounding box are measured in pixels. The velocity
is measured in pixels per frame.
h iT
zk = xabs yabs xdir ydir xcor ycor lbbox abs wbbox abs lbbox dir wbbox dir lbbox cor wbbox cor (3.5)
h iT
xk = x y ẋ ẏ xbbox ybbox (3.6)
The observation matrix Hk is deduced from the conversion of the state vector to the observation
vector therefore looks as follows:
 T
1 0 1 0 1 0 0 0 0 0 0 0
 
0 1 0 1 0 1 0 0 0 0 0 0
 
0 0 0 0 0 0 0 0 0 0 0 0
Hk =  (3.7)
 

0 0 0 0 0 0 0 0 0 0 0 0
 
0 0 0 0 0 0 1 0 1 0 1 0
 
0 0 0 0 0 0 0 1 0 1 0 1
The propagator Φk and the matrix Γk are defined as follows:

 
1 0 ∆t 0 0 0
 
0 1 0 ∆t 0 0
 
0 0 1 0 0 0
Φk =  (3.8)
 

0 0 0 1 0 0
 
0 0 0 0 1 0
 
0 0 0 0 0 1
14
 
1 0 0 0
 
0 1 0 0
 
∆t 0 0 0
Γk =  (3.9)
 

 0 ∆t 0 0
 
0 0 1 0
 
0 0 0 1
with ∆t set to 1, because time is measured in frames for this filter and not seconds.
The system noise matrix is a 4x4 matrix with σv2x σv2y , σbbox
2
l
2
,and σbboxw
as its diagonal values.
All these variances are set to 1 (Equations 3.10). Setting them to 1 yields good results.
 
σ2 0 0 0
 vx
σv2y

 0 0 0 
Qk = 
 0 2
 (3.10)
 0 σbboxl
0 

0 0 0 2
σbboxw
The observations noise matrix Rk is a 12x12 matrix with all the σ 2 for each observation as
diagonal elements. These variances change from image to image. In Subsection 3.2.5 it is
explained in detail how these variances are determined.
Equations 3.11 to 3.15 show how the filter is coded. These equations as seen below are
implemented into the code. Equations 3.11 and 3.12 are the extrapolation of the previous
state and covariance matrix with the given system. Equation 3.13 is to calculate the Kalman
gain matrix Kk , which is then used in Equations 3.14 and 3.15 to update the state and covariance.
(−) (+)
x̂k = Φk−1 x̂k−1 (3.11)
(−) (+)
Pk = Φk−1 Pk−1 ΦTk−1 + Γk−1 Qk−1 ΓTk−1 (3.12)
(−) (−)
Kk = Pk HTk (Hk Pk HTk + Rk )−1 (3.13)
(+) (−) (−)

x̂k = x̂k + Kk (zk − Hk x̂k ) (3.14)
(+) (−)
Pk = (I − Kk Hk )Pk (3.15)
The initial state vector x0 and covariance matrix P0 are shown in Equation 3.16 and 3.17.
 
griddim  
2
 griddim  50 0 0 0 0 0
   
 2   0 50 0 0 0 0 
0
   
 0 0 50 0 0 0 
x0 =  (3.16)
 
P0 =  (3.17)
  

 0 


 0 0 0 50 0 0 
1
   
   0 0 0 0 50 0 
 
1
0 0 0 0 0 50
15
The x and y position of the initial state vector is chosen in the middle of the overlayed grid
with a velocity of 0. The initial bounding box size is set to 1. In the covariance matrix only the
diagonal elements, the variances of each state element, are set to a non zero value. The values
are chosen very high with a value of 50. This is because the initialized estimated state vector
can vary highly from the first detected one.
As a last outlier detection, the difference between the predicted state and the observation is
calculated. If the difference is bigger than 50 times the corresponding variance in the covariance
matrix P, the variance for the observation is set to 999 in order to give it very little weight. The
idea behind this detection is the fact that it is highly unlikely for the tracked object to move
more than 50 times the variance between two frames.
Determining the Variances for each Image
In order to achieve a robust object tracking, it is necessary to have adapting variances, because
the segmentation does not always work perfectly. For example because the camera flow does
not accord with the true background flow (see chapter 3.2.4). Also the variances for the object
position using the detected corners must get higher as soon as the background is no longer
homogeneous. As already mentioned twice before, the properties of the object detected pixels
are used for calculating the variances.
Equation 3.18 shows how the extent is calculated. It is the ratio between all pixels inside
the bounding box and the pixels detected as object. This means the extent can reach values
between 0 and 1. The σ is in a second step calculated as seen in Equation 3.19. The underlying
assumptions to those equations are on one side the fact that an object is unlikely to be split in
more than one part. If the object is represented in more than one region, they would have to be
close together. On the other side, a rather compact shape of the object is assumed. For these
equations an extent of 1 would be ideal to achieve the smallest variance possible. An extent of 1
means all the pixels inside the bounding box are detected as object and the detected object has
the shape of a rectangle. In reality an aircraft does not have the shape of a rectangle. Therefore,
the extent of a perfect detected aircraft would not be 1, but a little smaller. The reason why
it is set to one is to keep the algorithm as general as possible and making as few assumptions
about the shape of the tracked object as possible.
Pixels detected as Object

Extent = (3.18)
Total Pixels inside Bounding Box
Numbers of Regions + Total Distance between all Regions

σ= (3.19)
2 ∗ Extent
Figure 3.7 shows two examples of how the result of the segmentation could look like. The white
pixels are the ones identified as object. As in the left image it did not work. Five regions are
detected and they are very far apart. This means the variance for this measurement is very high
and therefore the impact of this observation on the filtered object position is very small. The
image of the right side shows an almost perfect segmentation. All pixels of the object and no
16
more are detected. This leads to a small variance and the impact of this observation on the
filtered object position will be high.
Figure 3.7 Variance calculation example: On the left an example when the segmentation did not work
well, many regions that are far apart (Note: regions too small are not taken into account); on the right
side an example of good segmentation
17
18
4 Results & Discussion
4.1 Comparison to previous Method
Looking at the final landing approach of an aircraft the edge detection method failed in most
cases when the background holds anthropogenic structures such as houses or roads. However,
depending on lighting and setting of edge thresholds the previous method could usually deal
with a horizon line moving into the picture (Figure 4.1). As discussed in the introduction
this is not enough. The object detection and tracking has to be enhanced in order to have a
reliable automated tracking of the object. With the new algorithm it is possible to enhance the
Figure 4.1 Tracking with edge detection: Left shows edge detection with forest in back ground; right
shows houses in the background and the edge detection fails
detection and tracking of the object. As it can be seen in Figure 4.2 the object is detected in
both cases on the right position. To have more data to compare how good the object detection
and tracking works, a set of test data has been collected. Even though, a lot of data has already
been previously collected, those datasets used already automated tracking. This means if the
tracking works, the object stays in the center of the image the whole time and as soon as it stops
working the object is out of the image very quickly. For the test data the object was tracked
manually with a joystick. Figure 4.3 shows the distance of both tracking algorithms to the true
position. As it can easily be noticed, the previous method fails at tracking, with distances of
more than 400 pixels the object detection is not on target any more. In defence of the previous
method it has to be stated that adjusting the threshold for edge detection could improve the
results. But as soon as the background holds as strong edges as the object, it is likely to fail.
More details on the acquired test data and the performance of the results on it are in the next
section.
19
Figure 4.2 Tracking with OF still works in both cases where the previous method failed
Figure 4.3 Tracking the aircraft in the test data set: Previous method compared to the new approach
20
4.2 Results with Test Data
The test data has been collected from a station near Schleinikon in the Wehntal. Unfortunately
the camera sensor had dust and other particles on its surface. This contamination leads to
faulty results because a particle on the CCD-sensor is detected as moving object. In figure
4.4 the whole extent of the contamination is visible. Due to bad weather conditions and short
amount of time, the collection of test data could not be repeated. At first it was tried to clean
up the image by deleting the contaminated pixels. However, this proved to be worse than just
ignoring the dirt and run the algorithm normally. Out of the 6 sequences taken at the test data
Figure 4.4 Dust and other particles on the CCD sensor of the camera
collection, one was chosen. In Figure 4.5 a few frames are shown in different parts of the chosen
sequence. For this sequence the dirt is less visible due to the lighting conditions and the diverse
background. It consists of 347 frames. For each of this frame the object center was determined
manually, as well as the OF at a stable background pixel. The assumed accuracy for the manual
object detection is 5 pixels. Figure 4.6 shows the distance between the calculated object position
and the ground truth. It can be observed, that the highest peak reaches an offset of roughly
75 pixels. Considering the fact, that the aircraft measures for the test data roughly 170 pixels
in x direction and 50 pixels in y direction, this values count as successful detection. Because
the aircraft has not the same expansion in x and y direction it is necessary to have a look at
the displacement in just those two directions. Figure 4.7 shows the distance to the real object
in x and y direction as well. As it is shown, the peak at frame number 127 is mainly due to a
displacement in y direction. This is due to a sudden vertical movement of the camera. Besides
this peak, the value never gets higher than minus or plus 25 pixels. With a size of 50 pixels total,
this means the object was wrongly detected only for a very short period. In the x direction larger
displacements can be observed after frame number 200. This is due to a sudden changes in the
21
(a) frame 007 (b) frame 107
(c) frame 207 (d) frame 307
Figure 4.5 Results tracking an aircraft at different frames of the test sequence
22
Figure 4.6 Difference between ground truth and tracked position
Figure 4.7 Difference between ground truth and tracked position split into the x and y component of
the total distance
23
horizontal movement of the camera, a short tracking of a dirt particle, and the aircraft almost
dropping out of the field of view. However, the values never exceed the half of the x-extension of
the aircraft, therefore the object was detected correctly the whole time. Figure 4.8 shows these
incidents at frame number 127 and 254.
Due to the high amount of time necessary and the bad quality of most of the test data, the
ground truth was only determined for this single sequence.
(a) frame 127 (b) frame 254
Figure 4.8 Negative results tracking the aircraft in the test data
4.2.1 Influence of Grid Size
To minimize the time cost the grid size should be chosen as small as possible. To figure out
how small the grid can be with a reasonable segmentation possible, different grid size have been
tested. Figure 4.9 shows the deviation of the ground truth for the grid sizes 20, 32, 64, 128 and
200. It is obvious from these results that a grid size of 20x20 is not enough to detect and track
the object sufficiently. In figure 4.10 only the larger four grids are shown. The deflations from
one grid size to an other in this figure are barely noticeable. The mean of the distances for the
32x32 grid is 20.1 pixels, while the 128x128 gird has a mean distance of 20.3. With these values
and the visual evaluation of figure 4.10 the smallest working grid size is set to 32x32 pixels.
4.2.2 Influence of Corner Threshold
The influence of the threshold of the eigenvalues of the Hessian matrix is harder to determine
than the influence of the grid size. The quality of the picture has naturally a much higher
influence on this parameter. A good thresh hold for one image might not work for a different
setting (Figure 4.11). The left image in Figure 4.11 has in the seemingly homogeneous back
ground a lot of noise. This noise leads to high eigenvalues of the Hessian matrix and therefore
these pixels are falsely detected as corners. Presumably this noise is a result of the differently
set camera gain parameters. The left picture was taken in much less daylight. Simply setting
24
Figure 4.9 Difference between ground truth and tracked position: Grid size 20x20 shows large differences
and is therefore not usable
Figure 4.10 Difference between ground truth and tracked position: Here only the gridsizes 32, 64, 128,
and 200 are used. The differences are all very small, therefore the 32x32 grid size is usable
25
(a) Threshold fails and noisy background is also (b) Threshold works and only good corners for
detected as good corners for OF OF are detected
Figure 4.11 The same threshold is used for two different pictures
the threshold to a higher values would not work because then in the right image of this figure,
there would no more or only very few corners be detected and the segmentation would therefore
fail. To tackle this problem, the image could be smoothed over or different corner detection
techniques could be taken into consideration.
4.2.3 Camera Flow
Figure 4.12 shows the comparison between the camera flow and the true background flow. For
most they correspond well and it is possible to reduce the flow for the segmentation based on
absolute values. But at a few peaks the camera flow shows a much higher amplitude. The camera
flow is up to 15 pixels higher than the true back ground flow. With this offset the segmentation
fails and the background is detected as object and vice versa. As already mentioned in Chapter
3.2.4 this effect is likely be caused by the linear interpolation of the angles.
4.3 Time cost

The processing time is evaluated for a grid size of 32x32 because it is assumed that this is the
smallest possible resolution, where the detection and tracking still works satisfyingly.
As Table 4.1 shows, the total time to execute one loop of the algorithm takes roughly 250
milliseconds. This is not fast enough for the desired frequency of 20 Hertz. But the largest
part of this time takes running the .bat-file in MATLAB to call the C++ function. The actual
computation of the good corners, OF, segmentation, and Kalman filter takes only 70 milliseconds.
This is still to long but it has to be considered, that the segmentation is currently processed
in MATLAB. Because of its nature as a scripting language MATLAB is much slower than a
compiling language such as C++. The maximum time left for the segmentation and Kalman
filtering is below 10 ms, this is the equivalent to a factor of 0.33 to the running time in MATLAB.
26
Figure 4.12 Ground truth flow and camera flow
Detecting corners 10 ms C++

Optical flow 30 ms C++
Segmentation 30 ms Matlab
Kalman filter 0.4 ms Matlab
Rest 180 ms Matlab
Total 250 ms C++ and MATLAB
Table 4.1 Computation time of algorithm components
Depending on the source, the speed up factor ranges from 0.1 (Aruoba and Fernandez-Villaverde,
2014) to 0.002 (Andrews, 2012). Of course these times depend on the set task and can therefore
not be directly transferred to the speed up of the segmentation. But with the conservative
estimate of C++ being 10 times faster than MATLAB, the total time to execute the code would
come town to approximately 45 ms. Therefore it is reasonable to say, that the time constraint
set for this thesis can be achieved.
27
4.4 Tracking Different Objects
For a measurement campaign together with the Norwegian ski federation a down hill skier was
tracked during his run. The tracking was done manually with a joystick. To test on how the
algorithm behaves with a completely different object, the data collected during this measurement
campaign was also tested on the newly developed algorithm. As it can be seen in Figure 4.13
the detection and tracking also works for the skier. 4.13a is an example with corners in the
background (mast of ski lift), 4.13b an example without background corners, and 4.13c is partially
behind a gate.
(a) with background corners (b) without background corners
(c) partially obstructed view
Figure 4.13 Positive results tracking a skier
Even though the results look promising on a first glance there are also situations, when the
tracking fails. As seen in figure 4.14a there is also the problem of false corner detection what
leads the segmentation to fail. Another problem with tracking a skier is that multiple moving
28
objects are possible (Figure 4.14b).
(a) bad corner detection (b) other moving object
Figure 4.14 Negative results tracking a skier
4.5 Discussion
The results of the new tracking algorithm, show that it works. It does enhance the detection
and tracking compared to the currently used method. Therefore, the goal of this thesis has been
reached. Even with a contaminated camera sensor it is still possible to detect and track the
aircraft with a satisfying accuracy. In addition to the enhancement of object detection it is a
very general approached. As it was demonstrated in chapter 4.4, the shape of the object is not
relevant and it is also possible to track a completely different object, apart from aircraft. Besides
the enhancement of the detection and tracking, the algorithm has to work at a framerate of 20
fps. Because a big part of the code is still in MATLAB, the question if it is fast enough can
no be answered definitely. But as explained in Chapter 4.3 the estimate is, that the algorithm
should work at less then 50 milliseconds.
However, there are drawbacks of the new method that have not been solved yet:
• The optimum set of parameters has not yet been found. The one implemented right now
shows good results but it has not excessively been tested.
• The algorithm is also not fully implemented in C++ and therefore not embedded in the
current tracking software. This means the testing can not be done under real conditions
but rather just with a set of test data. Furthermore, the calculated position of the object is
not part of the feedback loop, and therefore no information on performance of the tracking
algorithm with the feedback-camera-movement is available.
• The problem considering the camera-flow is not solved either. Even though its effects are
mostly removed by the adjusting variances and the outlier detection. But in order to have
a clean and even more robust solution the camera-flow would have to be adjusted.
29
30
5 Conclusion & Outlook
The goal of this thesis was to enhance the automated tracking with a more robust object
detection. Trying to reach this goal, a new approach using optical flow was implemented. The
OF calculates the motion between two frames and the idea is, that the motion of the object and
the background differ significantly. Based on this assumption, the calculated flow is segmented
into two clusters, one for object and one for background. To enhance the results and make them
more robust they are filtered with a Kalman filter. The Kalman filter uses adapting observation
variances, depending on how reliable the results seem. Because the automated tracking needs
the tracking to be done in real-time, processing time is of crucial importance. For this the flow is
only calculated on a subset of the total number of pixels. As it turned out a even grid with the
dimension of 32x32 is sufficient to calculate a reasonable segmentation in object and background.
So far the results of this method are very promising. The object detection and tracking could be
enhanced dramatically within the given amount of time.
However, there is still work to do, in order to complete the task. With excessive testing, an
optimal set of parameters for the filtering has to be found. Especially the threshold for corner
detection has to be optimized. Preferably a solution is found, that is able to adapt to changing
lighting conditions. Also smoothing the image could prove helpful in the future. After the testing
it would be essential to implement the enhanced automated tracking into QDaedalus’ tracking
software. With this, further tests would show the real capability of the new object detection and
tracking algorithm.
Looking further ahead, with the new method it might be possible to track any object, instead
of only aircraft. Considering this, a whole new field of possible applications for the QDaedalus
system could arise. There are a number of interesting tasks involving for example high velocity
trajectories (Boffi, 2016). These trajectories could, at least in theory, be calculated automatically
and in real-time with a robust object tracking. Yet it is possible to track also different objects
with the new method, it is not reliable enough. To achieve higher reliability the camera-flow has
somehow to be adjusted to better fit the true flow of a stable background pixel and the adaptive
corner detection would have to implemented as well. With these mentioned problems fixed, the
field of applications for a contactless measurement system, that can precisely determine the
position of an even highly dynamic object, seems to be remarkable.
31
32
References
[1] T. Andrews. Computation time comparison between matlab and C++ using launch windows.
2012.
[2] S. B. Aruoba and J. Fernández-Villaverde. A comparison of programming languages in

economics. Technical report, 2014.
[3] J. L. Barron, D. J. Fleet, and S. S. Beauchemin. Performance of optical flow techniques.

International journal of computer vision, 12(1):43–77, 1994.
[4] H. Bay, A. Ess, T. Tuytelaars, and L. Van Gool. Speeded-Up Robust Features (SURF).
Computer Vision and Image Understanding, 110(3):346–359, jun 2008.
[5] G. Boffi. Dynamics-based system noise adaption of an extended Kalman filter for GNSS-only
kinematic processing. In ION GNSS+, 2016.
[6] G. Bradski and A. Kaehler. Learning OpenCV: Computer vision with the OpenCV library. ”
O’Reilly Media, Inc.”, 2008.
[7] B. Bürki, S. Guillaume, P. Sorber, and H. P. Oesch. DAEDALUS: A versatile usable

digital clip-on measuring system for Total Stations. In International Conference on Indoor
Positioning and Indoor Navigatio (IPIN), 2010.
[8] J. Canny. A Computational Approach to Edge Detection. In IEEE Transactions on Pattern

Analysis and Machine Intelligence, volume PAMI-8, pages 679–698, 1986.
[9] S. Conzett. Tracking of Fast Moving Objects Using Video Tachymetry. Master’s thesis,
ETH Zürich, 2014.
[10] P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan. Object Detection with

Discriminatively Trained Part-Based Models, 2010.
[11] A. Geiger, E. Favey, A. Filliger, and G. Beutler. Vorlesungsskript: Präzisionsnavigation.

ETH Zürich, 2013.
[12] H. Grabner and H. Bischof. On-line boosting and vision. In 2006 IEEE Computer Society
Conference on Computer Vision and Pattern Recognition (CVPR’06), volume 1, pages
260–267. IEEE, 2006.
[13] C. Harris and M. Stephens. A combined corner and edge detector. In Alvey vision conference,
volume 15, page 50. Citeseer, 1988.
[14] B. K. P. Horn and B. G. Schunck. Determining optical flow. Artificial intelligence, 17(1-
3):185–203, 1981.
[15] Z. Kalal, K. Mikolajczyk, and J. Matas. Tracking-learning-detection. IEEE transactions on

pattern analysis and machine intelligence, 34(7):1409–1422, 2012.
33
[16] D. G. Lowe. Object recognition from local scale-invariant features. In Computer Vision,
1999. The Proceedings of the Seventh IEEE International Conference on, volume 2, pages
1150–1157 vol.2, 1999.
[17] B. D. Lucas and T. Kanade. An iterative image registration technique with an application
to stereo vision. In IJCAI, volume 81, pages 674–679, 1981.
[18] K. Mikolajczyk, T. Tuytelaars, C. Schmid, A. Zisserman, J. Matas, F. Schaffalitzky, T. Kadir,

and L. V. Gool. A Comparison of Affine Region Detectors. International Journal of Computer
Vision, 65(1):43–72, 2005.
[19] T. Nüssli and R. Salzgeber. Automated high precision optical aircraft tracking. Master’s
thesis, ETH Zürich, 2015.
[20] I. Sobel. History and definition of the sobel operator. https://www.researchgate.

net/publication/239398674 An Isotropic 3 3 Image Gradient Operator, 2014. Accessed:
30.06.16.
[21] R. Szeliski. Computer vision: algorithms and applications. Springer Science & Business
Media, 2010.
[22] L. Van Gool, R. Szeliski, and V. Ferrari. Computer Vision. ETH Zürich, Zürich, 2011.
34
List of Figures
3.1 Edge detection method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3.2 The flowgraph of the implemented algorithm . . . . . . . . . . . . . . . . . . . . 9
3.3 Grid size and good corners to track . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.4 Reduced OF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.5 Histogram of the flow vectors absolute values . . . . . . . . . . . . . . . . . . . . 12
3.6 Histogram of the flow vectors directions . . . . . . . . . . . . . . . . . . . . . . . 13
3.7 Variance calculation example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.1 Tracking with edge detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

4.2 Tracking with OF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.3 Comparison between old and new method . . . . . . . . . . . . . . . . . . . . . . 20
4.4 Dust and other particles on the CCD sensor of the camera . . . . . . . . . . . . . 21
4.5 Results tracking an aircraft at different frames of the test sequence . . . . . . . . 22
4.6 Difference between ground truth and tracked position . . . . . . . . . . . . . . . 23
4.8 Negative results tracking the aircraft in the test data . . . . . . . . . . . . . . . . 24
4.11 The same threshold is used for two different pictures . . . . . . . . . . . . . . . . 26
4.12 Ground truth flow and camera flow . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.13 Positive results tracking a skier . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.14 Negative results tracking a skier . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
35

Automated High Precision Optical Tracking of Flying Objects: Master Thesis

Uploaded by

Copyright:

Available Formats

You might also like

Automated High Precision Optical Tracking of Flying Objects: Master Thesis

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Automated High Precision Optical Tracking of Flying Objects: Master Thesis

Uploaded by

Copyright:

Available Formats

Automated high precision optical tracking

Institute of Geodesy and Photogrammetry

Date of Submission: July 4th 2016

4 Results & Discussion 19

5 Conclusion & Outlook 31

2.1 Object Detection and Tracking

2.1.1 Edge Detection

2.1.2 Corner Detection

2.1.3 On-line Boosting

2.1.4 Discriminatively Trained Deformable Part Model (DPM)

2.1.5 Tracking-Learning-Detection (TLD)

2.2 Optical Flow

2.2.1 Lucas-Kanade Method

2.2.2 Horn-Schunck Method

2.2.3 Region-Based Matching

2.3 Discrete Kalman Filter

xk = Φk−1 xk−1 + Γk−1 wk−1 wk ∼ N (0, Qk ) (2.8)

3.1 Current Procedures

• Maximum intensity: the region with the highest count of edges

(a) Image with falsly detected object (b) Detected regions

3.2 New Method for Object Detection and Tracking

3.2.2 Corner Detection to Calculate OF

3.2.3 Calculate Optical Flow

Segmentation based on Absolute Values

∆Vertical Angle ∗ 3600

Segmentation based on Angles

xk = Φk−1 xk−1 + Γk−1 wk−1 wk ∼ N (0, Qk ) (3.3)

The propagator Φk and the matrix Γk are defined as follows:

(+) (−) (−)

Determining the Variances for each Image

Pixels detected as Object

Numbers of Regions + Total Distance between all Regions

4.1 Comparison to previous Method

(c) frame 207 (d) frame 307

(a) frame 127 (b) frame 254

4.2.1 Influence of Grid Size

4.2.2 Influence of Corner Threshold

4.2.3 Camera Flow

4.3 Time cost

Detecting corners 10 ms C++

Table 4.1 Computation time of algorithm components

(a) with background corners (b) without background corners

(c) partially obstructed view

Figure 4.13 Positive results tracking a skier

(a) bad corner detection (b) other moving object

Figure 4.14 Negative results tracking a skier

[2] S. B. Aruoba and J. Fernández-Villaverde. A comparison of programming languages in

[3] J. L. Barron, D. J. Fleet, and S. S. Beauchemin. Performance of optical flow techniques.

[7] B. Bürki, S. Guillaume, P. Sorber, and H. P. Oesch. DAEDALUS: A versatile usable

[8] J. Canny. A Computational Approach to Edge Detection. In IEEE Transactions on Pattern

[10] P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan. Object Detection with

[11] A. Geiger, E. Favey, A. Filliger, and G. Beutler. Vorlesungsskript: Präzisionsnavigation.

[15] Z. Kalal, K. Mikolajczyk, and J. Matas. Tracking-learning-detection. IEEE transactions on

[18] K. Mikolajczyk, T. Tuytelaars, C. Schmid, A. Zisserman, J. Matas, F. Schaffalitzky, T. Kadir,

[20] I. Sobel. History and definition of the sobel operator. https://www.researchgate.

3.1 Edge detection method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

4.1 Tracking with edge detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19