Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

CMPE264: Image Analysis and Computer Vision

Final Project Report

UAV Video Stabilization

Mariano I. Lizárraga (mlizarra@ucsc.edu)

Sonia Arteaga (sarteaga@soe.ucsc.edu)

August 15, 2007


CMPE264: Image Analysis and Computer Vision

1 Introduction
Unmanned Aerial Vehicles (UAVs) have slowly started to permeate to civilian
and law enforcement applications. Once thought to be exclusively employed
by the military, the UAVs are being used today in border surveillance, whale
and other sea mammal tracking, and search and rescue missions in disaster
areas. The usefulness of the imagery that these flying robots relay to the
controlling ground stations in military applications is directly related to how
much information the operator can “extract” from the frames being watched
in real-time. Image quality, RF noise rejection, and image stabilization have
come to play an important role in the overall performance measurement
of the mission. While many of the mainstream UAVs have sophisticated
equipment onboard to take care of the above mentioned problems, the price
tag is usually in the order of millions of dollars, making them unaffordable
about any other situation except military applications.
Recent advances in solid-state sensors and overall reduction of the elec-
tronics have made possible to noticeably improve the smaller UAVs capab-
ilities. Nevertheless, this smaller UAVs are now more sensible to natural
oscillations of the airplane and wind turbulence, thus degrading the stability
of the imagery served to the end user.
The Naval Postgraduate School (NPS) has been performing experimental
flights on tactical UAVs since 2001 in order to develop technology that sup-
ports U.S. troops in different peace-keeping scenarios around the world.
These small UAVs carry visual and IR cameras used to relay video down
to a ground station providing vital information for the deployed team. Even
though these UAVs have autopilots and robust control schemes implemented
on board, it is practically impossible to completely eliminate vibration and
oscillations due to external disturbances and natural behavior of the plane.
These oscillations get mechanically transmitted to the camera and as a con-
sequence the relayed video is difficult to watch and makes it exhausting for
the operator to evaluate.
To address the issue of oscillations and low frequency vibrations in the
recorded imagery, implementation of an image stabilization algorithm is re-
quired to improve visual quality. Furthermore, the stabilization algorithm
needs to be robust and computationally inexpensive to perform in real time,
and to run on the PC104 computer that is available at the ground station.

Final Project Report 1 August 15, 2007


CMPE264: Image Analysis and Computer Vision

2 Implementation
Image stabilization for moving platforms is usually related to high-frequency
unwanted motion compensation. Many of the widely known algorithms, like
the ones presented in [1], are very sensitive to panning and rotation, thus ren-
dering them useless for applications where intentional panning and rotation
are part of the application.
The image stabilization algorithm, and the Simulink implementation presen-
ted herein, follows directly the work presented in [2], showing very stable be-
havior for intentional panning and rotation. This algorithm offered promising
results in stabilizing the UAV footage provided by the Unmanned Systems
Lab at the Naval Postgraduate School. The implemented frame motion com-
pensation follows the one proposed in [3].
Simulink, a model based engineering tool developed by The Mathworks
(makers of Matlab), was picked as development platform for this project due
to its “block-oriented” design paradigm, offering great ease of use and better
understanding of each functional block of the algorithm.
The presented algorithm consist of five main functional blocks shown in
Figure 1:

Figure 1: Top Level Simulink Diagram

• Video reading and grayscale conversion,


• Gray-Code calculation,
• Sub-frame correlation measure calculation,
• Global motion calculation, and
• Motion compensation.

Final Project Report 2 August 15, 2007


CMPE264: Image Analysis and Computer Vision

2.1 Video Reading and Grayscale Conversion


The first step is to read frame-by-frame the video, and convert each frame into
a grayscale 8-bit image, which will be the actual input to the algorithm. This
is performed in Simulink with the block layout shown in Figure 2. Note that
before the output the video stream is down-sampled by two, thus completely
ignoring every other frame. This was done to improve the throughput and
increase the frame-rate of the output.

Figure 2: Grayscale Frame Conversion and Downsampling

2.2 Gray-Code Calculation


This functional block, decomposes the frame into 8 binary images ak , called
bit plane images, such that the frame f at time t is given by [2],

f t (x, y) = aK−1 2K−1 + aK−2 2k−2 + . . . + a1 21 + a0 20 (1)

Figure 3 shows the 8 bit plane decomposition of a given frame.


The next part of this functional block calculates the Gray-Code of two
successive bit plane images. The Gray-Code, named after Bell Researcher
Frank Gray, is a binary numeral system where two successive numbers only
differ in one digit [4]. The Gray-Coded Bit Plane image is given by:

gk = ak ⊕ ak+1 0 ≤ k ≤ 6. (2)

It is this gray-coded image gk that is passed onto the next functional


block.

Final Project Report 3 August 15, 2007


CMPE264: Image Analysis and Computer Vision

Figure 3: 8 Bit Plane Frame Decomposition

2.3 Sub-frame Correlation Measure


This functional block divides the gray-coded image gk into four regions of
size M × N and defines a search window of size (M + 2p) × (N + 2p), which
is explored in turn to calculate the following correlation measure [2]:
−1 N −1
1 MX
gkt (x, y) ⊕ gkt−1 (x + m, y + n).
X
Cj (m, n) = (3)
M N x=0 y=0
Therefore Cj will act as an accumulator to count the number of non corres-
pondences between gkt and gkt−1 , thus the smaller, the better. Figure 4 shows
the Simulink implementation of this functional block.

Final Project Report 4 August 15, 2007


CMPE264: Image Analysis and Computer Vision

Figure 4: Correlation Measure Calculation

2.4 Global Motion Calculation


This functional block chooses the minimum correlation measure Cj of each
region (thus the best match) and the actual coordinates inside the matrix
correspond to the local motion vector Vj obtained as:
Vj = min{Cj (m, n)}. (4)
These motion vectors Vj are stacked together with the global motion
vector Vgt−1 from the previous frame and passed trough a median filter to
obtain the current global motion vector Vgt :
Vgt = median{V1t , V2t , V3t , V4t , Vgt−1 }. (5)
Figure 5 shows the Simulink implementation of this functional block.

2.5 Motion Compensation


Since motion could be originated by intentional panning, the global motion
vector Vg needs to be damped to allow for smooth panning, given by:
Vat = D1 Vat−1 + Vgt (6)
With this motion vector, the original frame image is thus relocated to
remove the unwanted motion and still keep intentional panning. Figure 6

Final Project Report 5 August 15, 2007


CMPE264: Image Analysis and Computer Vision

Figure 5: Global Motion Calculation

shows the Simulink implementation of this functional block. Note that before
the output a frame rate transition block is included to keep the frame rate
constant, taking into account the down-sample mentioned in Subsection 2.1

Figure 6: Motion Compensation

3 Results
Several tests were run using different region sizes in order to quantify the
variability in the values of the motion vectors, since the value of the motion
vectors affects the visual quality of the motion compensated video footage.
The block sizes ranged a minimum of 6 × 6 pixels in increments of 25 up
to a maximum of 106 × 106 pixels. A fixed p in equation 3 value of 8 was
used. Then the motion vectors of the first 16 frames were plotted in a bar
graph to show the variability from frame to frame for the specified block size.
Figure 7 shows that the results for the block sizes 56–106 contain practically

Final Project Report 6 August 15, 2007


CMPE264: Image Analysis and Computer Vision

identical values. Thus allowing us to reduce the size of the region scanned
and noticeably improving the frame rate output to 7 frames per second for
the analyzed footage.

Figure 7: Motion Vector Components for Different Values of N

Using 56 × 56 regions, there existed the need to prove that the the sta-
bilization was working correctly. Therefore a small Simulink model, shown
in Figure 8 was set up in order to generate difference frames such that:

Fd = Ft − Ft−1 , (7)
for both, the original footage and the compensated footage. Figure 9 shows
three difference frames for a given sequence, showing that the compensation
indeed does much better than the original video. Figure 10 shows the mean
of each difference frame Fd for a segment of video footage. It is clear from
that figure that the compensated video does better in most of the cases than
that of the original video.

Final Project Report 7 August 15, 2007


CMPE264: Image Analysis and Computer Vision

Figure 8: Simulink Model of the Frame Difference Comparison

4 Conclusion
From the implementation of the previously described image stabilization al-
gorithm one can conclude the following:

• The GCBP method described in [2] and [3] show good performance in
stabilizing video footage that contains significant intentional rotation
and panning as was the case with the UAV footage.

• The significance of the results for the motion vectors mentioned in Sec-
tion 3 is that decreasing the block size also increases speed of the motion
compensation implementation. Therefore, the results above show that
we can run the model with a block size of roughly 56 × 56 pixels and
still attain the same level of quality as running the model with a larger
block size, but at a faster speed with reduced computational costs.

• The use of Simulink to implement the algorithm offered a great insight


into each step of the algorithm and allowed to test and debug each
functional block independently.

5 Acknowledgments
The authors of this Final Project would like to thank Dr. Vladimir Dobrok-
hodov from the Naval Postgraduate School Unmanned Systems Lab for provid-

Final Project Report 8 August 15, 2007


CMPE264: Image Analysis and Computer Vision

Figure 9: Difference Frames at different Time Intervals

ing us with several hours of UAV footage and invaluable support in the Sim-
ulink implementation of this algorithm.

References
[1] J. Bergen, P. Anandan, K. Hanna, and R. Hingorani, Hierarchical Model-
Based Motion Estimation, David Sarnoff Research Center, Princeton, NJ,
1992

[2] S. Ko, S. Lee, S. Jeon, E. Kang, Fast Digital Image Stabilizer Based
on Gray-Coded Bit Plane Matching, IEEE Transactions on Consumer
Electronics, Vol 45, No.3 August 1999.

[3] A. Brooks, Real-Time Digital Image Stabilization, Image Processing Re-


port, Department of Electrical Engineering, Northwestern University,
Evanston, IL, 2003.

[4] Gray Code, Wikipedia, the free encyclopedia,


http://en.wikipedia.org/wiki/Gray code, November, 2006.

Final Project Report 9 August 15, 2007


CMPE264: Image Analysis and Computer Vision

Figure 10: Mean of Difference Frames for a Segment of Video Footage

Final Project Report 10 August 15, 2007

You might also like