Hardware Implementation of A Real Time Lucas and Kanade Optical Flow

Hardware Implementation of a Real Time Lucas and Kanade Optical Flow
N. Roudel, F. Berry, J. Serot LASMEA 24 avenue des Landais 63177 Aubi` re, France e roudel,berry,serot@univ-bpclermont.fr Abstract
This paper presents a FPGA-based design that aims at apply real-time vision processes and specially Optical ow estimation processes. The main goal of this work is to be embedded in a Micro air robot in order to provide critical information for autonomous ights. Thus, the motion eld is one of the dominating information in the way of safety for the robot. Based on these motion information, obstacles avoidance, for example, could be add to increase the autonomous degree of the robot.
L. Eck CEA List 8 route du Panorama, BP6 92265 Fontenay-aux-roses, France Laurent.eck@cea.fr
1 Introduction
Since many years, lot of projects on development of autonomous land or air robots(UAV for Unmanned Aerial Vehicle) have been launched. Many reasons may explain such craze for these topics. Indeed specic tasks could be unsafe or even impossible for a human customers (areas of ghting, nuclear radiation, hazardous areas ,...). However, in spite of the hostile environments, the robot integrity must be insured. For this reason, different strategies of navigation and exploration can be used. In the most of these strategies, the knowledge of ego-motion and the measurements of potential moving target is a keystone of numerous algorithms. The motion evaluation can be done by different kind of sensors (inertial set, camera,...). Using a camera implies the computation of optical ow which is dened by pattern of apparent motion of objects, surfaces, and edges in a visual scene caused by the relative motion between an observer and the scene. However, extraction of optical ow has a high computation cost and usually the strategy to evaluate optical ow with an air robot consists in sending (via wireless communication) image ow and computing the motion on remote hardware. Once the computation is done, safety strategy is elaborated from these information and appropriate action is sent to the robot. This implies that on long distance ights (considering a static process-
ing base on the ground), lost of communication can appear and the UAV safety is not sure. Consequently, the autonomy of robot ight is highly limited. On the other hand, UAV implies several constraints such as the weight or/and the power consumption. So, the robot designer have to select the best matching between sensing devices, algorithms and hardware for processing. In these works, we propose to use a camera associated with a FPGA-embedded optical ow algorithm in order to measure the motion eld around the robot. Some papers, dealing with a hardware implementation of optical ow computation, can be found. Thus [1] and [2]proposed optical ow algorithm implementation with a computation speed of about 20-30FPS for a small image resolution (less than VGA resolution). In our application, this speed is too low for safeguarding the robot. In [3], a high speed real-time optical ow implementation is displayed with a PCI-e card including two FPGAs. This kind of works does not take into account the embedded aspect. Others papers, such as [4] and [5], proposed Real-time optical ow estimations based on different algorithms. The structure of this paper is as follows. In section 2, the Lucas and Kanade optical ow algorithm is presented and few considerations on its implementation are proposed. The next sections (3 and 4) introduces the data ow design and proposes the hardware implementation of the process. Finally, experimental results on realistic image sequence obtained by an implementation on our smart camera (SeeMos)is given in section 5.
2 Lucas and Kanade Algorithm

Due to the works of Baron [6] which compare the performances of correlation, gradient, energy and phase-based optical ow extraction methods, the Lucas and Kanade method [7] has been chosen as its not recursion and low computational complexity providing good accuracy. This method is local method which is only valid on small
motions. The main errors source of this algorithm is the well known aperture problem [8]. In this gradient-based method, velocity is computed from rst-order derivatives of image brightness, using the motion constraint equation: I dx I dy I + + =0 x dt y dt t (1)
It Gradient It
Image Sequence
Memory Swapping
Memory Swapping
Where I denotes the image intensity and where we can deu ne the motion U = with u = dx and v = dy . dt dt v I I In using the followinf notation Ix = x , Iy = y and It = I , Eq. 1 can be written as the following: t I T .U = It (2) The Lucas and Kanade approach can be considered as a minimization problem. Indeed, this method uses the least squares method applied to a region of interest () in the image. The velocity is computed on each pixel by the resolution of Eq. 4 : AT W2 AU = AT W2 b U = [AT W2 A]1 AT W2 b Where, for n points xi at a single instant t: A = [ I(x1 ), ..., I(xn )]T W = diag[W (x1 ), ..., W (xn )] b = [It (x1 ), ..., It (xn )]T In order to avoid the non-reversibility of the matrix, a coefcient is added, according to *******REFFFFFFFFFFFFFFFF**********, given the Eq. 5 AT W 2 A =
W (x) Ix (x) +
x x 2 2
Ix Gradient Ix Least Square Matrices Building
Iy Gradient Iy
AWA AWb
T 2
Matrix Inversion [A W A]
T 2 -1
Optical Flow Computation u v
Figure 1. Design of the ow.
(3) (4) shaping of image data used to provide necessary data for the next step of computation. The second is the 3D-gradients computation ( x , y , t ) applied to generate primitive information for the optical ow estimation. The last one consists in computing the both matrix AT W2 A and AT W2 b. 3.1.1 Data Shaping Module To perform the Lucas and Kanade optical ow estimation, two images are needed in order to compute It and on the other hand, Ix and Iy need three rows of the same image. For these reasons, the data shaping module controls the im(5) age ow in using three memories noted R1 , R2 , R3. To maintain a high level of synchronization, input image is stored in a memory and a memories swapping process is applied with the following FSM Tab. 1. Indeed, two frames are necessary to compute t-gradient. To perform the shaping of n rows of the same image, a FIFO memory with a size of image width. 3.1.2 3D-gradients computation Previous module supplying only useful information, the 3D-gradients computation becomes easier. Indeed, to perform the t-gradient, a subtraction between the data from two memories is needed. In order to minimize the noise a threshold is applied on the substraction. For x and y gradients, previous module provides a matrix of n n on which a convolution is applied with a predened mask.
W (x) Ix (x)Iy (x) W (x) Iy (x) +

x 2 2
W (x) Ix (x)Iy (x)

x
AT W 2 b =
W (x)2 Ix (x)It (x)

x x 2
(6)
W (x) Iy (x)It (x)
where W represents a diagonal matrix which weights the constraints with higher weights around the center of .
3 Design
3.1 Data ow
The data ow of the proposed algorithm is divided into four major parts as shown in Fig. 1. The rst one is the
Table 1. Memories Swapping Flow.

Current State S0 S1 S2 S3 S4 S5 S6 S7 S8 S9 Action First Writing in R1 End of First Writing in R1 First Writing in R2 End of First Writing in R2 Writing in R3 and Reading R2-R1 End of Writing in R3 and Reading R2-R1 Writing in R1 and Reading R3-R2 End of Writing in R1 and Reading R3-R2 Writing in R2 and Reading R1-R3 End of Writing in R2 and Reading R1-R3 Next state S1 S2 S3 S4 S5 S6 S7 S8 S9 S4
Ix Iy It
Ix
Data Shaping
Weighting
Sum of all component
Ix.Iy
Data Shaping
Weighting
Iy
Data Shaping
Weighting
Ix.It
Data Shaping
Weighting
Iy.It
Data Shaping
Weighting
Figure 2. Least Square Matrices Construction.
3.1.3 Optical ow computation This module is the core of the design. AT W2 A and AT W2 b matrices are, in a rst step, singly computed. Once done, velocity is obtained by multiplication between the 2x2 [AT W 2 A]1 and the 2x1 [AT W 2 b] matrix. As for previous module, a shaping is used to provide a n n matrix of x gradients and an other one of y gradients. Then the matrix computation of [AT W 2 A] occurs with multiplications and additions. The next stage is the inversion of this 2 2 matrix to obtain [AT W 2 A]1 . Hence, a division by determiner has to be done. Once u and v computation are done, these information are available for post processes.
4.2
Matrix Inversion and integer to xedpoint conversion unit
4 Hardware Implementation
4.1 Least Square Matrices Building
To compute the motion eld, two matrices representing Eq. 5 and Eq. 6 are needed. To build these matrices, ve 2 2 products are required : Ix , Ix Iy , Iy , Ix It and Iy It . For each product, a nn matrix is generated and each element is weighted with a higher weight for the central value, then all elements of each matrices are summed. For a computation with n x n matrices, hardware cost is given by: Memory : 5ImageWitdhDataWidth(n1) bits. Multplications : 5 + (5 n2 ). Additions : 5 (n 1)2 . For a good tradeoff between accuracy and hardware cost, the choice of a 3 3 matrix seems to be the most judicious. Indeed, for example with a 3 3 matrix for a 800 600 image resolution and a data width of 20 bits, 120Kbits of memory, 50 multipliers and 20 adders are needed against 240Kbits, 130 and 80 for a 5 5 matrix. Given that the goal of optical ow computation is to provide critical information to apply other processes, the use of a 5 5 matrix could compromised the integration of these processes.
To compute the optical ow, 2 2 matrix representing Eq. 5must be inverted. In order to have the same data width on the numerator and on the denominator, the determinant is computed and truncated. To perform the division, a function generated by the MegaWizard plug-in Manager included in the software Quartus II provided by Altera is used [13]. After Timming simulation, if the pipelined mode is not used, the maximum speed of this block is not enough high what is an important bottle neck for the design. Hence, the use of the pipelined mode with a latency of one cycle clock is necessary to keep the ow speed. To keep the accuracy of the computation, a division process, shown in Fig. 3, is applied. Indeed, the quotient of the rst division has only few signicant bits due to the local estimation of the ow. Then others bits are needed to increase the accuracy, thus the remain of the rst division is multiplied by 2p , where p represents the number of decimal values, and divided by the determiner.
Numerator Denominator Quotient
Divider
Concatenation
Result
Remain
x2
Quotient
Divider
Remain
Figure 3. Division Flow.
5 Results
5.1 Smart Camera SeeMOS
In order to validate this optical ow estimation, a smart camera research platform named SeeMOS is used. The SeeMOS architecture is presented in Fig. 4: This architecture is designed around a FPGA, and more precisely an Altera Stratix EP1S60. The FPGA device plays the central role in the system, being responsible to interconnect all other hardware devices. Surrounding it, 10Mb (5x2) of
Angular Error (AAE) Eq. 7. Angular Error is the angle beT (u ,v tween the correct ow = c2 c ,1) and the estimated vc 2
uc +vc +1
T (u ,v ow = e2 e ,1) . The second one is the Root Mean ve 2
Square Error (RMS) Eq. 8 AAE = 1 H.W arccos( . ) vc ve (7)
ue +ve +1
Figure 4. Architecture of the SeeMOS smart camera.
RM S =
1 H.W
((uc ue)2 + (vc ve)2 )
(8)
Where H and W represent the height and the width of the image resolution. SRAM and 64Mb of SDRAM are available for image and other data storage. The sensing devices board is composed, among others, of a CMOS imager (LUPA 4000) manufactured by Cypress. This 4 mega-pixel CMOS active pixel sensor features synchronous shut- ter and a maximum frame-rate of 15fps at full resolution ( 2048x2048 ). The readout speed can be boosted by means of sub sampling and windowed Region Of Interest (ROI) readout. Dynamic high range scenes can be captured using the double and multiple slope functionality. The dynamic optical range can be up to 67 dB in single slope operation and up to 90 dB in multiple slope operation. Finally, the communication between the smart camera and a PC is realized by the communication board using a Firewire 1394 link. The main clock for IEEE.1394 link to send one Byte is 20Mhz. In the case of the presented paper, 2 cycle of clock are needed to obtain the velocity on one pixel. Thus, for a frame with a resolution of 500x500, the frame rate obtained is of 40 Frames per second. In full resolution ( 2048x2048 ), the frame rate is around 2.4 FPS. In order to display the motion eld, a C++ library is used. This library, Cimg [12], receives u and v values for each pixel, builds the motion eld and displays it. [9], [10],[11] give more details about SeeMOS platform. Sequence Translating Tree Diverging Tree AAE 1.43 7 RMS 0.2 2.1 Density 100% 100% Parameters =1 =1
Table 2. Translating and Diverging Tree error measure. Tab. 3 shows synthesis report for an image resolution of 800 600. As said before, the choice of a 3 3least square matrix building is the most judicious, indeed, the stage uses 37% of the FPGA LEs. Optical ow computation reach to 42% of the device which is no negligible in order to implement post processes using these motion information.
6 Conclusion and Future Research

In this paper, a FPGA-based system to compute the well-known Lucas and Kanade optical ow algorithm is presented. The main goal of this system is to provide motion eld in real-time. One of the most important aspect is the speed of the computation. Indeed, the presented processing aims at being embedded on an air robot ying over 15 Km/h and so the motion eld has to be refreshed at high rate. The optical ow computation is the rst step of a project for a ying methodology including time-to-contact estimation, obstacle avoidance or also autonomous navigation. The main goal of the future work is to proposed a recongurable architecture dedicated to autonomous ying of air robot.
5.2
Experiments results
To discuss about accuracy of the system, the classical set of image sequence of the domain has been used. Indeed, without this set, accuracy estimation could be impossible due to the fact that the real ow of a real scene image sequence is unknown. Thus, Translating Tree and Diverging Tree sequence have been loaded into external memories. To evaluate the performance of the presented system, two error measurement are applied. The rst one is the Average
References
[1] M.V. Correia and A. Campilho, : A pipelined RealTime Optical Flow Algorithm, ICIAR - 2004.
Block Data Shaping Gradients Computation Least Square Matrices building Matrix Inversion Optical ow computation
Number of LEs 39 ( < 1% ) 27 ( < 1% ) 21094 ( 37% ) 2405 ( 4% ) 80 ( < 1% )
Memory bits 12768 bits ( < 1% ) 0 143280 ( 3% ) 0 0
Maximum Clock Frequency 176 MHz 66 MHz 176 MHz 15 MHz 35 MHz
Table 3. Synthesis report. [2] J. Diaz, E. Ros, F. Pelayo, E.M. Ortigosa and S. Mota, : FPGA-based real-time optical-ow system, IEEE Trans. Circuits and Systems of Video Technology 2006. [3] Hiroshima University Robotics Laboratory website : http://www.robotics.hiroshimau.ac.jp/hyper human vision/opt ow-e.html [4] P.C. Arribas, : Real Time Hardware Vision System Applications : Opitcal Flow and Time to Contact Detector Units, IEEE International Caracas Conference on Devices, Circuits ans System - 2004. [5] M.M. Abulated, A. Hamdy, M.E. Abuelwafa and E.M. Saad, : A Reliable FPGA-Based Real-Time Opticalow Estimation, International Journal of Computer Systems Science and Engineering - 2008. [6] J.L. Baron, D.J. Fleet and S.S. Beauchemin, : Performance of optical ow techniques, International Journal of Computer Vision - 1994. [7] B.D. Lucas and T. Kanade, : An iterative image registration technique with an application to stereo vision, Proceedings of the 7th International Joint Conference on Articial Intelligence - 1981. [8] K. Nakayama and G. Silverman, : The aperture problem-I, Vision Research 28 - 1988. [9] P. Chalimbaud and F. Berry, : Embedded active vision system based on an FPGA architecture , EURASIP Journal on Embedded Systems, - 2007. [10] F. Dias, F. Berry, J. Serot and F. Marmoiton, : Hardware, Design and Implementation Issues on a FpgaBased Smart Camera , Distributed Smart Cameras, 2007. ICDSC 07. First ACM/IEEE International Conference - 2007. [11] SeeMOS Project Website: http://wwwlasmea.univbpclermont.fr/Personnel/Francois.Berry/seemos.htm [12] Cimg Website: http://cimg.sourceforge.net/ [13] Quartus II divider datasheet: http://www.altera.com/literature/ug/ug lpm divide mf.pdf

Hardware Implementation of A Real Time Lucas and Kanade Optical Flow

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Hardware Implementation of A Real Time Lucas and Kanade Optical Flow

Uploaded by

Copyright:

Available Formats

Hardware Implementation of a Real Time Lucas and Kanade Optical Flow

2 Lucas and Kanade Algorithm

Ix Gradient Ix Least Square Matrices Building

Optical Flow Computation u v

Figure 1. Design of the ow.

W (x) Ix (x)Iy (x) W (x) Iy (x) +

W (x) Ix (x)Iy (x)

W (x)2 Ix (x)It (x)

W (x) Iy (x)It (x)

Table 1. Memories Swapping Flow.

Sum of all component

Sum of all component

Sum of all component

Sum of all component

Sum of all component

Figure 2. Least Square Matrices Construction.

Matrix Inversion and integer to xedpoint conversion unit

Figure 3. Division Flow.

Square Error (RMS) Eq. 8 AAE = 1 H.W arccos( . ) vc ve (7)

Figure 4. Architecture of the SeeMOS smart camera.

((uc ue)2 + (vc ve)2 )

6 Conclusion and Future Research

Number of LEs 39 ( < 1% ) 27 ( < 1% ) 21094 ( 37% ) 2405 ( 4% ) 80 ( < 1% )

Memory bits 12768 bits ( < 1% ) 0 143280 ( 3% ) 0 0

You might also like