Targetless Extrinsic Calibration Between Event-Based and RGB Camera For Intelligent Transportation Systems

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

Targetless Extrinsic Calibration Between

Event-Based and RGB Camera


for Intelligent Transportation Systems
Christian Creß? Erik Schütz Bare Luka Žagar Alois C. Knoll
2023 IEEE Intelligent Vehicles Symposium (IV) | 979-8-3503-4691-6/23/$31.00 ©2023 IEEE | DOI: 10.1109/IV55152.2023.10186538

Abstract—The perception of Intelligent Transportation Sys- Event-Based Camera RGB Camera


tems is mainly based on conventional cameras. Event-based
cameras have a high potential to increase detection performance
in such sensor systems. Therefore, an extrinsic calibration be-
tween these sensors is required. Since a target-based method
with a checkerboard on the highway is impractical, a targetless
approach is necessary. To the best of our knowledge, no working
approach for targetless extrinsic calibration between event-based Extrinsic Calibration
and conventional cameras in the domain of ITS exists. To fill this
knowledge gap, we provide a targetless approach for extrinsic
calibration. Our algorithm finds correspondences of the detected
motion between both sensors using deep learning-based instance
segmentation and sparse optical flow. Then, it calculates the
transformation. We were able to verify the effectiveness of our
method during experiments. Furthermore, we are comparable to
existing multicamera calibration methods. Our approach can be
used for targetless extrinsic calibration between event-based and
conventional cameras.
Index Terms—Targetless Calibration, Event-Based Cameras,
RGB Cameras, Sensor Fusion, Intelligent Transportation Systems

I. I NTRODUCTION Fig. 1: An RGB camera enables the acquisition of the structure


of an object. On the other hand, an event-based camera offers
Detailed detection of traffic participants is essential for an extremely temporal resolution and a very high dynamic range.
Intelligent Transportation System (ITS) for the creation of An extrinsic calibration is necessary to perform accurate data
high-quality digital twins. Previous ITS perception has mainly fusion and in order to benefit from the advantages of both
been based on conventional RGB cameras [1]. In poor visibil- cameras.
ity, e.g., fog or at night, the detection performance decreases.
Event-based cameras can make a very important contribution
here: They perceive changes in brightness from each pixel on Therefore, the targetless calibration with other sensor systems
the image sensor asynchronously. Due to this technique, they on the ITS must be solved.
have very high temporal resolution and very high dynamic
Because event-based cameras work in a fundamentally dif-
range (140 dB versus 60 dB of standard cameras) [2], which
ferent way, novel methods for basic problems for this sensor in
means they can improve detection performance in such poor
computer vision are required [2]. For this reason, the intrinsic
visibility situations. Currently, event-based cameras are used
and extrinsic calibration is performed in previous research
in the areas of moving object detection and tracking [3],
with modified checkerboards [6]–[10] or conventional checker-
traffic surveillance [4], and HDR image reconstruction [5].
boards in combination with image reconstruction [11]. Due to
For this reason, it is only a matter of time before event-
wind and weather effects, ITSs should be calibrated regularly.
based cameras spread in the field of ITSs. To be able to
Since a target-based calibration with a checkerboard on a
use the potential of event-based cameras in combination with
highway is impractical, targetless approaches are necessary.
other sensor systems, multimodal sensor fusion is required.
As event-based camera images are without sufficient texture
All authors are with the Chair of Robotics, Artificial Intelligence and Real-
information, established targetless methods for multi-camera
time Systems, TUM School of Computation, Information and Technology, systems cannot be used [6], [12].
Technical University of Munich, Munich, Germany. To the best of our knowledge, no targetless approach for
E-mail: christian.cress@tum.de, erik.schuetz@tum.de,
bare.luka.zagar@tum.de, knoll@in.tum.de extrinsic calibration between the event-based camera and the
? Corresponding author. standard camera in the domain of ITS exists. Although an

Authorized licensed use limited to: Zhejiang University of Technology. Downloaded on November 20,2023 at 13:17:06 UTC from IEEE Xplore. Restrictions apply.
approach for a moving robotic platform with event-based
cameras has been developed by [12], it cannot be used for ITS.
It is unsuitable because it is based on ego-motion, requires a
short baseline between the cameras, and has problems with
dynamic objects.
Therefore, the purpose of this paper is to provide a novel tar-
getless approach for extrinsic calibration between event-based
and conventional cameras. Our pipeline can be briefly summa-
rized as follows: Event-based cameras respond to brightness Fig. 2: The problem of targetless extrinsic calibration of a
changes in the scene, which can be interpreted as motion. That multi-camera system can be solved using image registration.
is why we extract motion from the conventional camera image Since images from event-based cameras are blurry and in-
and find correspondences to the event-based image. As shown accurate, and texture information is lost, existing approaches
in Figure 1, we calculate a suitable transformation between the for multi-camera systems do not work. In this sequence, no
two modalities and obtain the extrinsic calibration. In addition, meaningful result was achieved using SIFT in conjunction with
we show the effectiveness of our method, analyze the limits, a FLANN-based matcher.
and outline the direction of our future work.
In summary, the contributions of this work are:
• A brief description of existing calibration approaches for
where C > 0, and ∆tk is the time elapsed since the last event
event-based cameras with other sensor modalities as well
at the same pixel with the sign of the change pk ∈ {+1, −1},
as a quick explanation of targetless extrinsic calibration
also called polarity of the event. This change in brightness can
of conventional cameras in general.
be interpreted as motion in the scene.
• A novel method for targetless extrinsic calibration be-
tween event-based cameras and conventional cameras,
which is more pragmatic for use in ITS than existing B. Target-based calibration
target-based approaches.
• Qualitative and quantitative results which 1) are compara- The extrinsic calibration of an event-based camera with a
ble to existing targetless methods for extrinsic calibration conventional camera using a checkerboard pattern as a cali-
of conventional cameras and 2) the effectiveness of our bration target was performed in [11]. Here, the checkerboard
method for use in ITS. was moved in front of the cameras and the image of the
event-based camera was reconstructed using [5]. The cameras
II. R ELATED W ORK were then calibrated to the corner points of the checkerboard
First, we provide a brief overview of the functionality of using conventional OpenCV calibration methods. In [14],
event-based cameras and previous approaches to calibrating the camera was also moved and calibrated with a classic
event-based cameras with other sensor modalities, such as checkerboard pattern. In approach [6], a checkerboard pattern
other camera systems or Lidar. If the event-based camera was illuminated with a flashlight so that an extrinsic calibration
has a frame-based active pixel sensor (APS) integrated into could be performed.
the device, which is spatially aligned with the event sensor, As an alternative to classic checkerboard patterns, [7] and
frame-based images can be generated with the event-based [9] used flashing LED patterns; on the other hand, [8] and [10]
camera and established calibration methods can be used [13]. used flashing screens showing a checkerboard pattern. Authors
In this case, further procedures for calibrating the event- [15] have also implemented a similar solution: The intrinsic
based camera with the conventional camera are not necessary. and extrinsic calibration of a trinocular setup consisting of
However, APS sensors are not installed in most other event- an RGB camera, thermal camera, and the event-based camera
based camera models. For this reason, calibration between is realized with a cardboard and a computer screen. Here,
conventional cameras and event-based cameras is necessary. the cardboard containing the calibration pattern is placed in
A. Event-based cameras front of the screen. A black-and-white blinking animation is
then played on the screen. All sensors can be calibrated by
Event-based cameras perceive changes in brightness from
the pattern, the animation, and to the heat emitted by the
each pixel on the image sensor asynchronously. [2] and [11]
monitor. Calibration patterns are also used in the calibration
describe this behavior as follows: Event-based cameras have
between event-based cameras and Lidars: [16] have introduced
pixels that are independent and respond to changes in the
an extrinsic calibration pipeline between event-based cameras
continuous log brightness signal L(uk , t). In a noise-free
and Lidars. With it, a new 3D marker is used, which consists
scenario, an event ek = (uk , tk , pk ) is triggered at pixel
of a screen and planar markers with four circular holes. The
uk = (xk , yk )T at time tk when the brightness change
weakness of these state-of-the-art methods is that they require
∆L(uk , tk ) since the last event at the same pixel reaches a
a calibration target. Hence, this approach is quite impractical
threshold C, i.e.,
for calibrating sensors that are installed on roads and highways
∆L(uk , tk ) = L(uk , tk ) − L(uk , tk − ∆tk ) = pk C, (1) and are therefore exposed to environmental influences.

Authorized licensed use limited to: Zhejiang University of Technology. Downloaded on November 20,2023 at 13:17:06 UTC from IEEE Xplore. Restrictions apply.
Pre-processing event-based camera:

Noise Morphological
𝑰𝒆𝒃 𝒕 Removal Operation
Pre-processing conventional camera:

Movement
𝑰𝒓𝒈𝒃 𝒕−𝟏
Detection

Object Mask
Detection
Edge
𝑰𝒓𝒈𝒃 𝒕
Detection
Image registration and calibration:

Refinement Coarse
𝑰𝒄𝒐𝒎𝒃𝒊𝒏𝒆𝒅 𝒕 Alignment Alignment

Fig. 3: Our proposed pipeline is based on pre-processing of the camera data followed by image registration.

C. Targetless calibration source registration. This can be executed, for example, with
Targetless methods based on image registration are more optimization-based registration methods. The advantages of
suitable. Here, the problem of the domain gap has to be these methods are guaranteed mathematical convergence, no
considered: Event-based cameras can produce a gray image by training data is required, and good generalization to unknown
accumulating their event stream. Since this image is relatively scenes.
blurred and imprecise, and important texture information of the The challenges are noise and outliers, partial overlaps,
scene is lost, established methods such as SIFT [17] cannot and differences in density and scale. The principle of the
be used here [6], [12]. As seen in Figure 2, existing targetless Iterative Closest Point (ICP) algorithm [21] would be a simple
calibration methods for multi-camera systems are unsuitable. optimization method that computes correspondence and trans-
For this reason, [12] has developed a targetless approach formation for the targetless calibration with image registration.
that measures the correlation between the event-based camera For example, the approach of [22] calibrates a stereo camera,
and the RGB camera based on the brightness change between thermal camera, and Lidar targetless using ICP. The stereo
the pixels. However, this approach has the characteristics that camera in combination with the Lidar allows point cloud
the cameras must be close to each other while measuring their registration in 3D with absolute depth information. Further
ego-motion on a robotic platform. Furthermore, the proposed targetless extrinsic calibration approaches for conventional
method probably has weaknesses when dealing with dynamic cameras are [23] and [24]. These approaches search for
objects. Therefore, it is rather unsuitable for us. Nevertheless, correspondences between the image pairs and perform image
there is always one thing that targetless calibration methods registration. From there, they calculate the homography matrix
have in common: geometric or physical similarities in the data and decompose the extrinsic parameters up to scale.
between the different sensor modalities must be found [18].
III. M ETHODOLOGY
For example, in the approach of [19], the visibility of the laser
beam in the event-based image is used to perform an extrinsic This section describes our novel algorithm for targetless
calibration between Lidar and the event-based camera. extrinsic calibration between event-based cameras and con-
In addition, image registration can also be considered as a ventional cameras. The image content of event-based cameras
transformation problem between two point clouds [20]. To do indicates motion in the scene. However, in conventional cam-
this, the points must first be extracted from the sensor data and eras, motion is represented by the optical flow. As previously
then a matching algorithm must be carried out. The registration mentioned, targetless camera calibration is mostly based on
between the RGB and event-based camera would be a cross- image registration. Thus, our calibration pipeline finds a cor-

Authorized licensed use limited to: Zhejiang University of Technology. Downloaded on November 20,2023 at 13:17:06 UTC from IEEE Xplore. Restrictions apply.
respondence of the detected motion between the event-based
camera and the conventional camera and calculates a projective
transformation of both sensor modalities. We can divide this
task into three parts:
1) Pre-processing of the gray image, which is based on
accumulated sensor data from the event-based camera.
2) Pre-processing of the conventional camera image.
3) Finding image correspondences for image registration,
from which the intrinsic parameters can be derived. (a) Input image (b) Accurate motion edges
Figure 3 gives an overview of the whole approach. Fig. 4: Pre-processing pipeline of an event-based image.
Since our algorithm estimates the sensor cross-calibration
with motion correspondence, a few assumptions have to be
made: The event-based camera and the conventional camera To receive a stable edge image E ∈ R2 , we combine the edge
have to spot the identical dynamical object from almost the images, based on the kernels mentioned, with
same perspective. The object must be at a sufficient distance
from the camera system so that it can be considered a E = Everti + Ehoriz + Ediag1 + Ediag2 . (3)
planar plane. In the current version of our algorithm, only
one dynamic object, which can be detected with the YoloV7 As a final step, we perform a dilation with kernel size
instance segmentation [25]–[27], is allowed to be in the field ksize = 3 × 3 on the edge image. Figure 4 shows the result
of view of all cameras. Last but not least, the images have to of the pre-processing.
be time synchronized. In practice on an ITS, these assumptions B. Pre-processing conventional camera
would be given if a single vehicle drove through the field of
view of the sensors. The goal of the pre-processing conventional camera image
is to create the most similarity to the event-based edge image.
A. Pre-processing event-based camera For this purpose, motion from image I t ∈ R2 and the previous
Event-based cameras send information about changes in the image I t−1 is calculated. To enable a real-time calculation of
brightness of each pixel asynchronously. Since the amount motion, we first use the OpenCV function “calcOpticalFlow-
of raw data of an event-based camera can be very large, we PyrLK”, which implements a sparse iterative version of the
accumulate the brightness changes over the last 5000 µs in a Lucas-Kanade optical flow in pyramids [29], and second, the
grayscale image on the camera hardware. This process saves Good Features To Track method by J. Shi and Tomasi [30].
limited network resources on the ITS. In this grayscale image, Flow vectors in a static camera can be induced by moving
white areas indicate event polarity of +1, and black areas of objects or by environmental conditions, e.g., wind gusts. These
−1. Gray areas in the image show no motion. vectors differ mainly in their length. Therefore, inspired by
To reduce complexity, we ignore polarity in our approach. [31], we analyze the flow vectors in terms of their angle and
Therefore, we convert the grayscale image into a binary mask length. A vector v ∈ R2 with a specific length l is assigned
which only distinguishes between dynamic and static objects: to camera motion, e.g., due to wind, as follows:
Static areas in the image have black pixels, and white pixels
indicate a dynamic image area. After conversion, we remove vl < (m + C). (4)
salt-and-pepper noise from the binary image using a median Here, m is the median of the length of all optical flow vectors,
filter with kernel size ksize = 3 × 3. and C ∈ R is a constant value, e.g., 0.5. The other flow vectors
The size of the white area is an indication of the move- were generated by dynamic objects.
ment speed of the dynamic object. Since we only need With the flow vectors, induced by camera motion, we can
the exact edges in the image, we analyze similarly to [28] perform an Affine transformation with the OpenCV method
for the presence of an edge. To achieve maximum runtime “warpAffine” for motion compensation. Then, we can extract
performance, we extract the edges using a combination of the motion in the camera image with the image difference
efficient morphological operators. In detail, we used hit-miss I t − Icompensated . To further refine the result, we also use a
structuring elements (kernels K ∈ R3 ) for edge detection KNN Background Subtractor [32] and combine the two motion
in the directions vertical, horizontal, positive diagonal, and masks via bitwise AND operation.
negative diagonal as follows: Due to low texture in some parts of the object, impor-
   
0 1 0 0 −1 0 tant object contours are lost as a result of the movement
Kverti =  0 1 −1  , Khoriz =  1 1 1 , extraction mentioned. For this reason, we use deep-learning
0 1 0 0 0 0 based instance segmentation provided by YoloV7 [25]–[27].
The object detected that contains the most movement is the
   
0 −1 1 1 −1 0
Kdiag1 =  0 1 0  , Kdiag2 =  0 1 0 . dynamic object. This ratio can be calculated as follows:
1 0 0 0 0 1 mi
(2) r= for i = 0, ..., n, (5)
di

Authorized licensed use limited to: Zhejiang University of Technology. Downloaded on November 20,2023 at 13:17:06 UTC from IEEE Xplore. Restrictions apply.
with the number of pixels m in the mask motion, the number
of pixels d in the instance segmentation mask, and n numbers
of detected objects.
Finally, Canny edge detection is performed on the entire
image, which is combined with the extracted motion via
bitwise AND operation. This is how we get clear contours
of a moving object in a camera image. Now the image
content between event-based and conventional cameras looks
sufficiently similar for image registration. Figure 5 illustrates (a) Categorized optical flow (b) Motion compensation
the robust estimation of camera movement.

C. Image registration and calibration


First, the cameras could have significant differences in the
position or focal length of the lenses. Therefore, a coarse
estimation of the transformation Tcoarse ∈ R3 between the
two sensor modalities must be made. For this purpose, a
bounding rectangle around the motion masks of the event-
based reb and conventional camera rrgb is calculated. From (c) Motion compensation w. KNN (d) Instance segmentation
this, a coarse displacement t ∈ R2 and scaling s ∈ R2 can
initially be calculated:
rrgb w rrgb h
sx = , sy = ,
rebw rebh
rebscaled x = sx · rebx , rebscaled y = sy · reby , (6)
tx = (−1 · rebscaled x ) + rrgb x ,
ty = (−1 · rebscaled y ) + rrgb y ,
  (e) Accurate motion mask (f) Accurate motion edges
sx 0 tx
Fig. 5: Pre-processing pipeline of an RGB image.
⇒ Tcoarse =  0 sy ty  . (7)
0 0 1
The functionality of the ICP algorithm highly depends on a
good initial alignment. Hence, we are refining the previously
coarse alignment with the convex hull around the motion
masks of the event-based and conventional camera sensor.
From the centers ceb ∈ R2 and crgb ∈ R2 of the hulls, we
can refine the displacement Tref ∈ R3 as follows:
tx = crgb x − cebx , ty = crgb y − ceby , (8)
  (a) Before ICP alignment (b) After ICP alignment
1 0 tx
Fig. 6: Image registration between the event-based camera
⇒ Tref =  0 1 ty  . (9)
(green) and the RGB camera (red).
0 0 1
As previously mentioned, the dynamic object must have a
sufficient distance from the cameras so that it corresponds With this projective transformation and considering the in-
approximately to a planar surface. With this assumption, we trinsics Keb ∈ R3 of the event-based camera and Krgb ∈ R3
can extract points from the known edges and calculate the of the conventional camera, we can describe the projection of
homography matrix HICP ∈ R3 using the ICP algorithm. a point peb to prgb with the formula Krgb −1 Hf inal Keb .
This algorithm further refines the scaling and translation as
In addition, we can now calculate with the OpenCV method
well as taking the rotation and perspective of the camera
“decomposeHomograpyMat” the extrinsic parameters. Note:
orientation into account. Then, we receive the final projective
As previously mentioned, in the targetless extrinsic calibration
transformation matrix Hf inal ∈ R3 , which takes Tcoarse and
approaches [23] and [24], only a translation up to scale is
Tref into account as follows:
determined. So the direction of the vector is known, but not
its length. This limitation has no negative consequences for
Tbef oreICP = Tref · Tcoarse , us since we can now perform precise data fusion with the
(10) calibration of both sensor modalities.
Hf inal = HICP · Tbef oreICP .

Authorized licensed use limited to: Zhejiang University of Technology. Downloaded on November 20,2023 at 13:17:06 UTC from IEEE Xplore. Restrictions apply.
mentioned in Section III were met. First, we evaluated the
calculated rotation and translation between the event-based and
conventional cameras. The cameras were not moved during the
measurement. Since an exact ground truth measurement of the
translation and rotation of the sensor setup in the real world
was not possible for us, we have analyzed the tolerance range
(a) Gantry bridge at the ITS Prov- (b) Laboratory parking lot TABLE II: We used the reprojection error in pixels as our
identia++ [33]
quantitative metric for our results. Here, we have presented the
Fig. 7: Our setup consists of an event-based camera (left) and error of our algorithm as well as our manually created Ground
a normal camera (right). Truth (GT). Sequences 1 and 3 achieve very high accuracies.
Due to the smaller object size, the error in Sequence 2
increases.
IV. E VALUATION
Here, we present the results of our targetless extrinsic Sequence # Frame Rep. Err. Rep. Err. GT
1 4.54 1.61
calibration algorithm using real-world data. First, we measure 2 4.90 1.39
our approach in terms of runtime performance. Then, we 1
3 3.87 1.08
explain the sensor setup used for data recording. And last, 4 6.45 1.71
5 2.60 1.37
we analyze quantitative and qualitative calibration results and 6 4.42 2.51
discuss them in terms of strengths and limits. 1 7.26 2.18
2 8.91 2.52
A. Runtime performance 3 8.75 1.96
2
4 8.99 2.31
Since the runtime performance could become important for 5 5.51 1.81
us, we also analyze our extrinsic calibration approach for 6 8.26 1.53
1 6.42 0.91
real-time suitability. Table I shows the computation time of 2 3.52 1.48
each step of our method. In total, we have reached 583 ms 3 3.81 1.44
3
per frame. The image resolution of the event-based and RGB 4 4.40 0.99
5 3.46 1.04
image was 640 × 480 pixels. The measurements were carried 6 4.70 0.82
out on an Intel Core i7-8550U CPU, NVIDIA GeForce GTX
1050 with Max-Q Design, and 8 GB RAM. We have observed
that our approach is suitable for fast online calibration. of the extrinsic values in the sequences. Figure 8 shows the
results. In Sequences 1 and 3, we have recognized a rotation of
TABLE I: Runtime performance of our targetless extrinsic almost 0◦ with a maximum spread of 0.57◦ in the X-direction
calibration pipeline. for Sequence 1. As previously mentioned, only the direction of
the translation vector is obtained from the homography matrix.
Component Runtime
The calculated directions are plausible to our assessment of the
Pre-processing event-based camera 23 ms
Pre-processing rgb camera 400 ms test setup, where primarily a displacement in the positive X-
Image registration and matching 160 ms direction and negative Y-direction took place. Unfortunately,
Total 583 ms in Sequence 2, unique values per frame could not always be
decomposed from the homography matrix. Therefore, in these
cases we selected the solution based on plausibility from the
B. Sensor setup four possibilities. Here, the rotation has measured correctly
For our experiments, we mounted an event-based camera almost at 0◦ , but outliers have been found in the translation.
and a conventional camera on two tripods. This gave us We attributed this to the inaccuracies in the homography
the necessary flexibility for our experiments. The event-based calculation of Sequence 2, which is explained in more detail
camera was positioned to the left of the conventional camera. in the analysis of the reprojection error below.
Figure 7 shows the sensor setup for the recorded sequences. As the second evaluation step, we calculated the reprojec-
The conventional camera was a Basler acA1920-50gc with a tion error of our calibration approach based on our ground
16mm lens. The Imago VisionCam EB was used as an event- truth data, which we manually created by clicking. Table II
based camera, also with a 16mm lens. The intrinsics of the shows the reprojection error of the ground truth data and our
event-based camera Keb and conventional camera Krgb , and extrinsic calibration. The best reprojection error was 2.60 px
their distortion models have been calibrated beforehand. and could be found in Sequence 1. The average reprojection
error in Sequence 1 was 4.46 px, in Sequence 2 7.95 px, and
C. Calibration results in Sequence 3 4.39 px. This shows that our approach works
To test the effectiveness of our algorithm, we recorded three effectively. Room for improvement of our approach can be
sequences. Then, we took the frames where the assumptions seen in Sequence 2: The poorer values can be explained here

Authorized licensed use limited to: Zhejiang University of Technology. Downloaded on November 20,2023 at 13:17:06 UTC from IEEE Xplore. Restrictions apply.
T ranslz T ranslz T ranslz

T ransly T ransly T ransly

T ranslx T ranslx T ranslx

Rotz Rotz Rotz

Roty Roty Roty

Rotx Rotx Rotx

−0.4 −0.2 0 0.2 0.4 −0.5 0 0.5 1 1.5 2 2.5 −0.4 −0.2 0 0.2 0.4
Translation in pixels, rotation in degrees Translation in pixels, rotation in degrees Translation in pixels, rotation in degrees

(a) Sequence 1 (b) Sequence 2 (c) Sequence 3


Fig. 8: Tolerance range of the rotation and translation direction in pixels — lower is better.

Seq. 1 - #2 Seq. 1 - #5 Seq. 2 - #1 Seq. 2 - #2 Seq. 3 - #1 Seq. 3 - #5


Eve. Cam.
Con. Cam.
GT
ICP
Results

Fig. 9: The qualitative results of our algorithm are plausible and satisfactory in all sequences. It can be seen that the event-based
camera is to the left of the RGB camera.

with the relatively small representation of the vehicle in the to existing calibration methods.
image (see Figure 9). The slightest deviations in the image
registration result in high inaccuracies. The larger the image Furthermore, the qualitative results in Figure 9 demonstrate
area for calculating the calibration values, the more accurate the projection of the event-based image into the conventional
is the result. The approach by [34] has performed a targetless camera image. Here, we can see that the event-based camera
extrinsic calibration method between two conventional RGB was to the left of the conventional camera. All in all, the
cameras. They have achieved an average reprojection error qualitative results clearly indicate that our approach works well
of 4.01 px. The extrinsic calibration [6] between event-based in all sequences. However, we have to expect inaccuracies
and conventional cameras using a checkerboard had an error with smaller moving objects. In summary, our approach can
of 2.16 px. So, we are quantitatively precise and comparable be used for targetless extrinsic calibration between event-based
and conventional cameras.

Authorized licensed use limited to: Zhejiang University of Technology. Downloaded on November 20,2023 at 13:17:06 UTC from IEEE Xplore. Restrictions apply.
V. C ONCLUSION [11] M. Muglikar, M. Gehrig, D. Gehrig, and D. Scaramuzza, “How to cali-
brate your event camera,” in 2021 IEEE/CVF Conference on Computer
Within the scope of this work, we have developed a novel Vision and Pattern Recognition Workshops (CVPRW). IEEE, 2021.
targetless approach for extrinsic calibration between event- [12] A. Censi and D. Scaramuzza, “Low-latency event-based visual odome-
based and conventional cameras. For this purpose, we have try,” in 2014 IEEE International Conference on Robotics and Automation
(ICRA). IEEE, 2014.
first analyzed the characteristics of event-based cameras, and [13] E. Dubeau, M. Garon, B. Debaque, R. de Charette, and J.-F. Lalonde,
second, their calibration techniques. “Rgb-d-e: Event camera calibration for fast 6-dof object tracking,” in
Based on this research, we decided to perform the targetless 2020 IEEE International Symposium on Mixed and Augmented Reality
(ISMAR). IEEE, 2020.
extrinsic calibration with the image registration technique. [14] K. Huang, Y. Wang, and L. Kneip, “Dynamic event camera calibration,”
Therefore, our algorithm finds correspondences of the detected in 2021 IEEE/RSJ International Conference on Intelligent Robots and
motion between the event-based camera and the conventional Systems (IROS). IEEE, 2021.
[15] C. Plasberg, M. G. Besselmann, A. Roennau, and R. Dillmann, “Intrinsic
camera using deep learning-based instance segmentation and and extrinsic calibration method for a trinocular multimodal camera
sparse optical flow, then it calculates the transformation be- setup,” in 2022 25th International Conference on Information Fusion
tween both sensor modalities. (FUSION). IEEE, 7/4/2022 - 7/7/2022, pp. 1–6.
[16] R. Song, Z. Jiang, Y. Li, Y. Shan, and K. Huang, “Calibration of event-
During experiments in the real environment, we were able to based camera and 3d lidar,” in 2018 WRC Symposium on Advanced
verify the effectiveness of our method: Our targetless extrinsic Robotics and Automation (WRC SARA). IEEE, 2018.
calibration approach is qualitatively and quantitatively precise, [17] D. G. Lowe, “Distinctive image features from scale-
invariant keypoints,” International Journal of Computer Vision,
and comparable to existing state-of-the-art methods. vol. 60, no. 2, pp. 91–110, 2004. [Online]. Available:
For future work, we propose to consider alternative reg- https://link.springer.com/article/10.1023/B:VISI.0000029664.99615.94
[18] J. Nie, F. Pan, D. Xue, and L. Luo, “A survey of extrinsic parameters
istration techniques, as well as to perform the calculation calibration techniques for autonomous devices,” in 2021 33rd Chinese
of the transformation parameters based on multiple moving Control and Decision Conference (CCDC). IEEE, 2021.
objects. These measures would further increase the stability [19] K. Ta, D. Bruggemann, T. Brödermann, C. Sakaridis, and L. van Gool,
“Lasers to events: Automatic extrinsic calibration of lidars and event
and accuracy of our extrinsic multimodal sensor calibration. cameras,” arXiv e-prints, p. arXiv:2207.01009, 2022.
[20] X. Huang, G. Mei, J. Zhang, and R. Abbas, A comprehensive survey on
ACKNOWLEDGMENT point cloud registration. arXiv, 2021.
This research was funded by the Federal Ministry of Educa- [21] P. J. Besl and N. D. McKay, “A method for registration of 3-d shapes,”
IEEE Transactions on Pattern Analysis and Machine Intelligence,
tion and Research of Germany in the project AUTOtech.agil, vol. 14, no. 2, pp. 239–256, 1992.
FKZ: 01IS22088U. We would like to express our gratitude for [22] T. Fu, H. Yu, W. Yang, Y. Hu, and S. Scherer, “Targetless extrinsic
making this paper possible. calibration of stereo cameras, thermal cameras, and laser sensors in the
wild.” [Online]. Available: https://arxiv.org/pdf/2109.13414
[23] M. Knorr, W. Niehsen, and C. Stiller, “Online extrinsic multi-camera
R EFERENCES calibration using ground plane induced homographies,” in 2013 IEEE
[1] C. Creß, Z. Bing, and A. C. Knoll, “Intelligent transportation systems Intelligent Vehicles Symposium (IV), 2013, pp. 236–241.
using external infrastructure: A literature survey.” [Online]. Available: [24] M. Miksch, B. Yang, and K. Zimmermann, “Automatic extrinsic camera
https://arxiv.org/pdf/2112.05615 self-calibration based on homography and epipolar geometry,” in 2013
[2] G. Gallego, T. Delbruck, G. Orchard, C. Bartolozzi, B. Taba, A. Censi, IEEE Intelligent Vehicles Symposium (IV), 2013, pp. 832–839.
S. Leutenegger, A. J. Davison, J. Conradt, K. Daniilidis, and D. Scara- [25] W. Kin-Yiu, “Implementation of paper - yolov7: Train-
muzza, “Event-based vision: A survey,” IEEE Transactions on Pattern able bag-of-freebies sets new state-of-the-art for real-
Analysis and Machine Intelligence, vol. 44, no. 1, pp. 154–180, 2022. time object detectors,” 14/01/2023. [Online]. Available:
[3] Mitrokhin, Anton and Fermüller, Cornelia and Parameshwara, Chethan https://github.com/WongKinYiu/yolov7/tree/mask
and Aloimonos, Yiannis, “Event-based moving object detection and [26] arXiv.org, “Yolov7: Trainable bag-of-freebies sets new state-of-the-art
tracking,” in 2018 IEEE/RSJ International Conference on Intelligent for real-time object detectors,” 07/07/2022.
Robots and Systems (IROS), pp. 1–9. [27] H. Chen, K. Sun, Z. Tian, C. Shen, Y. Huang, and Y. Yan,
[4] M. Litzenberger, B. Kohn, A. N. Belbachir, N. Donath, G. Gritsch, “Blendmask: Top-down meets bottom-up for instance segmentation.”
H. Garn, C. Posch, and S. Schraml, “Estimation of vehicle speed based [Online]. Available: https://arxiv.org/pdf/2001.00309
on asynchronous data from a silicon retina optical sensor,” in 2006 IEEE [28] V. Brebion, J. Moreau, and F. Davoine, “Real-time optical flow for
Intelligent Transportation Systems Conference, pp. 653–658. vehicular perception with low- and high-resolution event cameras,” IEEE
[5] H. Rebecq, R. Ranftl, V. Koltun, and D. Scaramuzza, “High speed and Transactions on Intelligent Transportation Systems, pp. 1–13, 2021.
high dynamic range video with an event camera,” IEEE Transactions [29] J. Bouguet, “Pyramidal implementation of the lucas kanade feature
on Pattern Analysis and Machine Intelligence, vol. 43, no. 6, pp. 1964– tracker,” 1999.
1980, 2021. [30] Jianbo Shi and Tomasi, “Good features to track,” in 1994 Proceedings
[6] Qifan Zhang, Jinwei Ye, Philip Osteen, and S Susan Young, Co- of IEEE Conference on Computer Vision and Pattern Recognition, 1994,
Calibration and Registration of Color and Event Cameras, 2020. pp. 593–600.
[Online]. Available: https://apps.dtic.mil/sti/citations/AD1116489 [31] J. Kim, X. Wang, H. Wang, C. Zhu, and D. Kim, “Fast moving
[7] M. J. Dominguez-Morales, A. Jimenez-Fernandez, G. Jimenez-Moreno, object detection with non-stationary background,” Multimedia Tools and
C. Conde, E. Cabello, and A. Linares-Barranco, “Bio-inspired stereo Applications, vol. 67, no. 1, pp. 311–335, 2013. [Online]. Available:
vision calibration for dynamic vision sensors,” IEEE Access, vol. 7, pp. https://link.springer.com/article/10.1007/s11042-012-1075-3
138 415–138 425, 2019. [32] Z. Zivkovic and F. van der Heijden, “Efficient adaptive density estima-
[8] E. Mueggler, B. Huber, and D. Scaramuzza, “Event-based, 6-dof pose tion per image pixel for the task of background subtraction,” Pattern
tracking for high-speed maneuvers,” in 2014 IEEE/RSJ International Recognition Letters, vol. 27, no. 7, pp. 773–780, 2006.
Conference on Intelligent Robots and Systems, 2014, pp. 2761–2768. [33] innovation mobility.com, “Providentia++: A9 testfeld für autonomes
[9] E. Mueggler, N. Baumli, F. Fontana, and D. Scaramuzza, “Towards fahren und digitale erkennung von fahrzeugen,” 18/08/2022. [Online].
evasive maneuvers with quadrotors using dynamic vision sensors,” in Available: https://innovation-mobility.com/projekt-providentia/
2015 European Conference on Mobile Robots (ECMR), 2015, pp. 1–8. [34] B. Pätzold, S. Bultmann, and S. Behnke, “Online marker-free extrinsic
[10] H. Kim, “Real-time visual slam with an event camera,” Ph.D. disserta- camera calibration using person keypoint detections,” Cham, pp. 300–
tion, 2017. 316, 2022.

Authorized licensed use limited to: Zhejiang University of Technology. Downloaded on November 20,2023 at 13:17:06 UTC from IEEE Xplore. Restrictions apply.

You might also like