Project

Improving State Estimation Accuracy of
VINS-Fusion with the Kalman Filter

Christopher Suzuki Jingchun Cao Leo Chen
Department of Mechanical Engineering Department of Mechanical Engineering Department of Mechanical Engineering
Carnegie Mellon University Carnegie Mellon University Carnegie Mellon University
csuzuki2@andrew.cmu.edu jingchuc@andrew.cmu.edu ikaic@andrew.cmu.edu
Peter Blumenstein
Department of Mechanical Engineering
Carnegie Mellon University
pblumens@andrew.cmu.edu
Abstract—VINS-Fusion is a proven state estimation algorithm it can estimate its relative state to objects in its environment.
used in Simultaneous Localization and Mapping (SLAM). In this The state of the vehicle can change based on the use case of
paper, we attempt to improve the state estimation accuracy of the UV, but in the general sense usually includes the position,
VINS-Fusion by coupling it with a Kalman Filter. Specifically,
we compare the results of VINS-Fusion by itself to the outputs of orientation, and speed of the UV.
VINS-Fusion plus an Extended Kalman Filter and an Unscented Two proven algorithms for state estimation are VINS-
Kalman Filter. The respective algorithms will be tested with the Fusion and the Kalman Filter (KF). VINS-Fusion, a form of a
EuROC MAV dataset. Our results will show that the accuracy Visual Inerital System (VINS), has already proven to be very
does not significantly change when VINS-Fusion is combined with effective [1]. However, we wanted to see if we can improve
either variant of the Kalman Filter. Our assumption of constant
covariance matrices for both Kalman Filters is hypothesized to its effectiveness by coupling it with another proven algorithm.
have strongly impacted the performances of our algorithms. As a result, in this project, we attempt to improve the
Index Terms—VINS-Fusion, SLAM, state estimation, Kalman state estimation of VINS-Fusion by modifying it with the KF.
Filter Since, there are multiple forms of the KF, the two specific KF
variations we have chosen are the Extended Kalman Filter
I. I NTRODUCTION (EKF) and the Unscented Kalman Filter (UKF). We have
Over the last few years, we have seen a rise in the demand chosen these two variations for the following reasons:
for autonomous vehicles (AVs). This rise in demand is largely 1 EKF and UKF provide different approaches to nonlinear-
due to the fact that the social, economical, and environmental ity. We wanted to compare these two approaches.
benefits of AVs are becoming more realized. AVs limit the 2 Simple implementation
need for a human driver, alleviating humans from a taxing 3 Low computational cost
obligation. AVs can also be turned into Robotaxis when not
To summarize, our project will be a comparison of the
in use, providing economical gain for the owner. This can also
following:
help limit the number of vehicles on the road, decreasing the
carbon emissions caused by vehicle use. 1 VINS-Fusion
The benefits of AVs don’t stop there. An AV is essentially 2 VINS-Fusion + EKF
an unmanned vehicle (UV), so its application extend beyond 3 VINS-Fusion + UKF
the road. A good example would be autonomous robots that
II. R ELATED W ORKS
operate in closed settings such as robot vacuums used to
clean homes or factory robots used to transport goods around There has already been a lot of proven work on VINS,
manufacturing facilities. Another good example would be showing its advantages. VINS uses both visual data and
drones. Drones can be utilized in construction environments inertial measurement units (IMUs) as its inputs [1]. With the
for safe inspection of hard-to-reach areas. They can also be use of these two kinds of inputs, it allows VINS to estimate
used during search-and-rescue operations. roll and pitch angles, in addition to the common positions and
However, the effectiveness of UVs is evidently based on the velocities. With the incorporation of IMUs, VINS can mitigate
methods behind it that help the UV navigate its environment situations in which the visual sensors don’t perform as well.
safely and effectively. However, despite its many advantages, VINS has drawbacks
One of said methods is Simultaneous Localization and too. VINS is a nonlinear algorithm. It also doesn’t have direct
Mapping (SLAM), which has proven to be an integral part distance measurements. Both of these characteristics make
in the effectiveness of UVs. The essential goal of a SLAM initializing VINS extremely difficult. Furthermore, VINS is
algorithm is to create a map of the vehicle’s environment, so prone to the drift issue. A lot of additional sub-algorithms
need to be run to handle the drift, increasing the algorithm’s OpenVINS was tested in simulation and in the real world
complexity. against VINS-Fusion, an extension of VINS-Mono, and other
There have been extensions of VINS to improve it. One such VINS algorithms. Results showcased OpenVINS not only able
extension is called VINS-Mono [1]. VINS-Mono has a novel to reduce drift but also outperformed the other algorithms in
initialization procedure to deal with the initialization problem. state estimation. This is seen in OpenVINS’ relatively lower
It also incorporates a online relocalization methodology to absolute pose error (see Fig. 3).
handle the drift issue. Another key advantage of VINS-Mono
is that it can merge previous pose graphs with the current graph
for more accurate state estimation. An overview of VINS-
Mono can be seen in Fig. 1.
Fig. 3. OpenVINS relative pose error in comparison to current VINS

algorithms [2].
With this in mind, we decided to incorporate these two

previous works. We decided to utilize VINS-Fusion, an exten-
Fig. 1. VINS-Mono pipeline [1]. sion of VINS-Mono, as our base VINS algorithm [1]. Inspired
by OpenVINS, we wanted to couple VINS and KF to see if
VINS-Mono was tested against a state-of-the-art algorithm performance could be improved. The two variations of KF we
called OKVIS. VINS-Mono, with a loop closure, proved to have chosen are EKF and UKF.
perform better at longer distances as seen in its smaller relative
pose error (see Fig. 2). III. M ETHODOLOGY
In our project, we studied the impact of EKF and UKF on
the camera path obtained from VINS-Fusion. On a high level,
this is done in the following steps:
1 Simultaneous application and visualization of EKF on
VINS-Fusion results
2 Simultaneous application and visualization of UKF on
VINS-Fusion results
3 Post-processing and comparison of EKF and VINS-
Fusion errors
4 Post-processing and comparison of UKF and VINS-
Fusion errors
Fig. 4 shows the overview of our pipeline.
Fig. 2. Relative pose error of VINS-Mono against OKVIS [1].

Fig. 4. Project overview.
Furthermore, since we are looking to incorporate KF into
VINS, we looked at previous works that have done the same. EKF and UKF are both commonly used and well-
One such work is an VINS variation named OpenVINS [2]. It established techniques for pose estimation, estimation refine-
similarly uses both visual and IMU data, and also incorporates ment and simultaneous mapping, whereas VINS-Fusion is a
EKF. state-of-the-art slam algorithm first published in 2019 [3].
Since its publication, VINS-Fusion is known for producing
highly accurate SLAM results, making it valuable for further
studies.
In this section, I will briefly introduce VINS-Fusion, EKF,
UKF, explain the assumptions we made to obatined the results,
and briefly discuss our comparision method.
A. VINS-Fusion
VINS-Fusion solves the SLAM problem using information
obtained from both visual sensors such as cameras and low-
cost IMUs. It is an extension based on the VINS-Mono
algorithm. The following paragraph describes how VINS-
Mono works on a high level.
The first step in the VINS-Mono algorithm is measurement
prerprocessing. In this step, key features are extracted from Fig. 5. VINS-Fusion
the input visual data and preintegrate IMU measurements
between two consecutive frames. The preprocsssing step is
followed by the initialization step, where values of pose, There are 3 major steps in EKF. The first step is the ini-
velocity, gravity vector, IMU (gyroscope) bias, and 3-D fea- tialization step, where values of poses and their corresponding
ture locations are first generated [1]. They are usually used covariance are first generated. It is followed by the prediction
for 6-DoF state estimation in small environments [3]. The step, where a new state and new covariance are predicted based
initialization step is necessary for the following step, where on the previous state using a nonlinear state transition function.
nonlinear optimization-based visual-inertial odometry (VIO) Then, EKF incorporates measurements obtained from sensors
is performed. In the last step, optimization-based methods or to compute the Kalman gain and correct the prediction in the
filters such as EKF are performed on geometrically verified prediction step. This process is performed iteratively. At each
relocalization results in order to reduce drift [1]. This process time step, EKF linearizes the model around its mean via first
can be viewed as the first step - the local localization step order Taylor-series expansion (and therefore the computation
in VINS-Fusion. Although both have been proven to produce of the Jacobian matrices) [4].
accurate local localization results, drift is unavoidable over
time, making an additional step necessary [3]. VINS-Fusion C. UKF
then extends the VINS-Mono algorithm by adding a second UKF is structurally similar to EKF in the sense that it
step - the global localization step, where data obtained from also comprises of three major steps: the initialization step, the
global sensors such as GPS and magnetometers is fused prediction step and the correction step. However, instead of
with local sensors for more accurate global-aware localization linearizing the system around its mean via Taylor expansion,
results. Using global sensor data alone is insufficient since UKF computes a set of ”sigma points” (each carrying a
they are usually noisy and obtained at a low frequency [3]. weight), and transforms each sigma point through its non-
VINS-Fusion’s complexity, combined with the fact that it is linear function. This means that no Jacobian matrices need
relatively young and has less support compared to older and to be computed for UKF. A comparison between EKF and
well-known programs, makes it challenging to implement. In UKF can be found in Fig. 6.
this project, we worked with the HKUST implementation of
VINS-Fusion, which required the Robotic Operating System
(ROS) Kinetic or Noetic. The HKUST implementation uses
optimization-based approaches in its local localization step
instead of EKF-based methods. [3] explains that optimization
is chosen over EKF since states can be iteratively linearized
to increase accuracy in optimization-based approaches [3].
It is worth noting that accurate initialization is important for
obtaining good VINS performance [1]. This led to our decision
of using open datasets for which the system has already been
Fig. 6. EKF vs. UKF comparison [5].
tuned.
Fig. 5 shows a screen shot of VINS-Fusion running on our
In general, UKF is expected to yield the same results as EKF
laptop.
for linear systems and better results for nonlinear systems [4].
B. EKF
Since its publication, EKF has been popular due to its D. Assumptions
simplicity, high accuracy and high efficiency in solving local It was observed that VINS-Fusion generates zero values in
localization problems [3], making it worthy of study. its covariance matrix due to its utilization of an optimization
estimation technique, which precludes the formulation of pre- and the optimized + filtered paths. Fig. 10 shows EKF and
dictions for future time steps’ covariance. In order to enhance VINS-Fusion running simultaneously. The green path is the
computational efficiency, we implemented a strategy involving original path estimated by VINS-Fusion, whereas the red
the integration of constant covariance matrices into VINS- arrow represents the current pose produced by EKF based
Fusion, which were subsequently utilized as inputs for the on the VINS-Fusion results. In addition, we also recorded all
Extended Kalman Filter (EKF) and Unscented Kalman Filter states generated and compared their mean, RMSE and STD,
(UKF) processes. Fig. 7 and Fig. 8 shown covariance matrices and errors compared to ground truths in the post-processing
we used in this project: step.
Fig. 7. Covariance Matrix for EKF
Fig. 10. Result Analysis Method Overview
During testing, we found that feeding filtered states into

VINS-Fusion consistently generated larger errors compared
to not doing so. This is probably due to our assumption
that the covariance matrices are constants. Therefore, for
Fig. 8. Covariance Matrix for UKF more meaningful discussions, we will mainly compare results
obtained by applying EKF and UKF on the VINS-Fusion
The presented covariance matrices have been adjusted to results but not results obtained by applying VINS-Fusion on
minimize errors specifically for the Vicon dataset, and recali- EKF and UKF results in this report.
bration is necessary for alternative scenarios. Due to time con-
straints, only a subset of the 16 parameters in the Unscented IV. E XPERIMENTS
Kalman Filter (UKF) were tuned, and similar limitations ap- In this study, we conducted experiments to systematically
plied to the Extended Kalman Filter (EKF). Subsequent efforts evaluate the VINS-Fusion algorithm by integrating EKF and
should encompass the development of a resilient covariance UKF techniques into the state estimation process. Three dis-
estimator to enhance the efficacy of both the EKF and UKF tinct scenarios were examined: VINS-Fusion operating inde-
filters. pendently, VINS-Fusion augmented with EKF (VINS+EKF),
and VINS-Fusion augmented with UKF (VINS+UKF). The
E. Comparison Method assessments utilized data from the EuRoC MAV dataset,
We compared the VINS-Fusion results with VINS-Fusion specifically focusing on Vicon Hall 1-01 and Machine Hall
+ EKF results and VINS-Fusion + UKF results in two ways 1 scenarios. Our objective was to comprehensively investi-
as shown in Fig. 9 gate the impact of these filtering methods on enhancing the
accuracy and robustness of the state estimation derived from
the VINS-Fusion algorithm. The incorporation of real-world
datasets from the EuRoC MAV collection provided a practical
foundation for assessing the algorithm’s efficacy in diverse and
challenging environments. Through rigorous experimentation
and analysis of these three scenarios, we aim to contribute
valuable insights into the optimization of VINS-Fusion for
improved performance in complex scenarios.
Fig. 9. Result Analysis Method Overview

V. R ESULTS
The experiments yielded a comprehensive dataset and cor-
Since EKF and UKF are both highly efficient algorithms responding results. Comparative analysis was conducted for
that have low computing power and memory requirements, each case scenario against the ground truth trajectory provided
we were able to apply them in real time along-side VINS- by the EuRoC MAV dataset Vicon Hall-1-01, revealing minor
Fusion to generate live demo comparing the optimized path deviations. As depicted in Fig. 14, the standalone application
of VINS-Fusion demonstrated deviations comparable to those
observed when VINS-Fusion was augmented with an EKF.
Notably, the incorporation of an UKF in VINS-Fusion resulted
in a marginal increase in state estimation error. However, this
discrepancy was attributed to the insufficient tuning of the
static covariance matrix for the given experimental dataset. In
addition, it is essential to note that the estimated trajectory and
ground truth data were aligned using Umeyama alignment due
to their storage with different origins. This alignment process
ensured a consistent reference frame for accurate comparison.
Fig. 13. VINS + EKF Trajectory Error Result.
Fig. 11. EKF Trajectory Error Result.
The following figures show the case by case deviation of the

state estimation of the trajectory of the camera using different
techniques.
Fig. 14. VINS + UKF Trajectory Error Result.
enhanced covariance tuning, particularly in the context of

turning dynamics, could decrease the observed drift during
such maneuvers, and concurrently refine the state estimation.
This improvement is anticipated to be attainable through
tuning of both the EKF and UKF components, contributing to
an overall advancement in trajectory accuracy and robustness.
VI. C ONCLUSION
Fig. 12. VINS Trajectory Error Result.
In this paper we set out to determine whether or not the use
The graphical representations clarify the discernible chal- of a Kalman Filter could improve the state estimation accuracy
lenges encountered by VINS-Fusion, akin to several visual of VINS-Fusion. Our results demonstrated that VINS-Fusion
SLAM algorithms, particularly evident during rapid turning combined with an Extended Kalman Filter produced similar
motions denoted by the darker red trajectories. This phe- results to VINS-Fusion alone, but did not significantly improve
nomenon contrasts with the smoother and straighter portions accuracy. The use of a UKF actually decreased the state
of the trajectories, characterized by a dark blue hue. With estimation error.
Though our results demonstrate VINS-Fusion to be most
accurate on its own, improvements can be made to better refine
our findings.
For both KFs, a constant covariance matrix was assumed.
We hypothesize this influenced the performance of both VINS-
KF variants. With tuned covariance matrices, the VINS-KF
variants should perform better in comparison to our initial
results.
However, even with tuned covariance matrices, it is not
expected that the VINS-KF variants will perform significantly
better than VINS-Fusion alone. VINS-Fusion has already
proven to a state-of-the-art state estimation algorithm. Further
modifications to it are unlikely to improve its existing perfor-
mance. As a result, even with constant covariance matrices, we
still see our results as being a conclusive comparison between
VINS-Fusion, VINS-Fusion + EKF, and VINS-Fusion + UKF.
R EFERENCES
[1] T. Qin, P. Li, and S. Shen, “Vins-mono: A robust and versatile monocular
visual-inertial state estimator,” IEEE Transactions on Robotics, vol. 34,
no. 4, pp. 1004–1020, 2018.
[2] P. Geneva, K. Eckenhoff, W. Lee, Y. Yang, and G. Huang, “Openvins: A
research platform for visual-inertial estimation,” in 2020 IEEE Interna-
tional Conference on Robotics and Automation (ICRA), 2020, pp. 4666–
4672.
[3] T. Qin, S. Cao, J. Pan, and S. Shen, “A general optimization-based
framework for global pose estimation with multiple sensors,” 2019.
[4] R. E. Michael Kaess, “Extended kalman filter - 16-833 robot localization
and mapping lecture slides,” 2023.
[5] H. S. Chadha, “The unscented kalman filter: Anything
ekf can do i can do it better!” Nov 2019. [Online].
Available: https://towardsdatascience.com/the-unscented-kalman-filter-
anything-ekf-can-do-i-can-do-it-better-ce7c773cf88d

Project

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Project

Uploaded by

Copyright:

Available Formats

Improving State Estimation Accuracy of

VINS-Fusion with the Kalman Filter

Fig. 3. OpenVINS relative pose error in comparison to current VINS

With this in mind, we decided to incorporate these two

Fig. 2. Relative pose error of VINS-Mono against OKVIS [1].

Fig. 7. Covariance Matrix for EKF

Fig. 10. Result Analysis Method Overview

During testing, we found that feeding filtered states into

Fig. 9. Result Analysis Method Overview

Fig. 13. VINS + EKF Trajectory Error Result.

Fig. 11. EKF Trajectory Error Result.

The following figures show the case by case deviation of the

Fig. 14. VINS + UKF Trajectory Error Result.

enhanced covariance tuning, particularly in the context of

You might also like