Professional Documents
Culture Documents
Contextual Autonomy Evaluation of Unmanned Aerial Vehicles in Subterranean Environments
Contextual Autonomy Evaluation of Unmanned Aerial Vehicles in Subterranean Environments
Abstract—In this paper we focus on the evaluation of con- different sUAS. Our results show that the proposed approach
textual autonomy for robots. More specifically, we propose a calculates a contextual autonomy score that can be used to
fuzzy framework for calculating the autonomy score for a rank the systems for each context.
small Unmanned Aerial Systems (sUAS) for performing a task
while considering task complexity and environmental factors. II. R ELATED W ORK
Our framework is a cascaded Fuzzy Inference System (cFIS)
composed of combination of three FIS which represent dif- Some of the first and more simplistic methods of categoriz-
ferent contextual autonomy capabilities. We performed several ing autonomous systems are the Levels of Automation (LOA)
experiments to test our framework in various contexts, such as proposed by Sheridan [6] and its later expansion [7]. LOA
endurance time, navigation, take off/land, and room clearing,
with seven different sUAS. We introduce a predictive measure defines automation as “the full or partial replacement of a
which improves upon previous predictive measures, allowing for function previously carried out by the human operator” in a 1
previous real-world task performance to be used in predicting to 10 range; 1 being full control by the human and 10 being
future mission performance. full control by the computer. LOA does not accurately describe
Index Terms—Contextual Autonomy, Unmanned Aerial Vehi- how outside factors can affect the autonomous capability of a
cles, Fuzzy Systems
system. While it could theoretically be applied to a robot, it
would not be accurate as it fails to accommodate for differing
I. I NTRODUCTION
degrees of difficulty in tasks, and environmental factors.
In today’s world, robots are expected to become increasingly Another evaluation method is known as the Autonomy
present by assisting humans in performing various tasks in Control Levels (ACL) [2]. ACL is designed for Unmanned
different environments. While some robots have been designed Aerial Vehicles (UAV), and operates on a similar basis of uti-
for a single purpose, others can accomplish a variety of tasks lizing autonomy levels from 0-10, with 0 being fully remotely
with different levels of autonomy. Measuring robot autonomy controlled by a pilot, and 10 being a human-like system.
is an important and ever evolving concept and existing meth- These levels closely resemble the 10 LOA, following the same
ods for evaluating robot autonomy can be categorized into concept. The ACL characterizes each system according to
two main families: contextual and non-contextual. While the four metrics, which attempt to categorize different areas of
former methods consider mission and task-specific measures autonomous behaviors for the system. In each of these, an
(e.g., ALFUS [1], ACL [2]), the latter only rely on implicit autonomy level from 0-10 is given based upon these behaviors.
system capabilities and do not consider the mission and This system has a similar drawback, in that it does not account
environment features (e.g., NCAP [3], [4]). for difficulties in the mission itself.
Our study in this paper focuses on evaluating the contextual Another method is the Autonomy and Technological Readi-
autonomy for small Unmanned Aerial Systems (sUAS). Exist- ness Assessment (ATRA) [8]. ATRA attempts to combine both
ing methods such as ALFUS [1] and MPP [5] share a similar the basic theory behind the Autonomy Level, and the Tech-
shortcoming in that neither provides a simple implementation nology Readiness Level (TRL) metric into one framework [8].
for use with real-world systems. Another drawback of existing TRL utilizes these two metrics in an attempt to evaluate the
methods that our approach addresses is the lack of a consistent autonomy level provided by different technologies onboard the
process for breaking down tasks into sub-tasks and combining UAS. This is emphasized as a solution for the gap between
scores calculated for sub-tasks into a unified score for the given existing theoretical work and technological advances in the
task. In this paper we propose a method for evaluating the UAS autonomy space.
contextual autonomy of sUAS based on a fuzzy interface that Autonomy Levels for Unmanned Systems (ALFUS) is a
allows the operator to design and modify the evaluation system method for defining the autonomy of a system in terms of
using linguistic reasoning. We designed four indoor tasks three different axes [9]. ALFUS has a strong theoretical basis,
(endurance time, navigation, takeoff/land, and room clearing) but somewhat impractical in the real-world implementation
and tested our interface in various experiments with seven due to the lack of maturity in some of these systems, as well
as the inability of most, if not all, available systems to reach those metrics, to provide a level evaluation of a system. As
the upper levels of the three axes. a generic framework, ALFUS tends to have a very broad,
The three axes mentioned are known as the Mission Com- and somewhat open to interpretation, definition of metrics.
plexity (MC), Environmental Complexity (EC), and Human For instance, in the case of the EC axis, it ranges from a
Independence (HI) axes. Each one of these axes pertains itself “simple environment,” to an “extreme environment.” However,
to a different aspect of the contextual autonomy of a system. the summary model describes the system in terms of an
The MC axis pertains mostly towards the difficulty of the tasks autonomy level for each axis, while the Contextual Autonomy
and movements required of the system to complete the task Capability within ALFUS provides an actual score for each
(e.g. maneuvers, speed, searching). Alternatively, the EC axis axis. Due to the autonomy level evaluation, there is some
concerns itself with the difficulty in the performance of the ambiguity when characterizing systems. This is one of the
task caused by environmental factors (e.g. Lighting, Obstacles, main concerns with ALFUS, in that while it does provide a
Enclosed Spaces). Lastly, the HI axis is representative of the strong theoretical background, the actual implementation of
level of independence between the user and the system (e.g. the ideas with real-world systems is not as clear.
task planning, task execution). We utilize Takagi-Sugeno Fuzzy Inference Systems (FIS)
Due to the ability to split the representation of a system’s as a means to combine different metrics in an evaluation of
autonomy into these three axes, it allows for the character- an sUAS which is both easy to use, and allows us to use
ization and evaluation of system’s autonomy in real world some data which is either not easily defined numerically, or
tests, including the impact that both the environment and the inherently qualitative about the environment, combined with
mission profile can have on the system’s autonomy. Our work standard quantitative metrics. Fuzzy inferences also allow for
in this paper is based off many of the ideas put forward slight deviations in a metric to not cause a drastic change
through ALFUS, and we utilize it as a foundational part of in the evaluation of that sUAS. We designed a set of tests
our contextual autonomy evaluation. with various mission and environment complexity levels (see
The Mission Performance Potential is proposed as a method Section V), and defined a fuzzy inference system for each test.
for the evaluation of a unmanned system’s autonomous per- Unlike MPP [5], our fuzzy inference systems are based on the
formance, as well as a predictor for future missions [5]. three-axis model used in ALFUS, by creating an individual FIS
This method provides a metric which represents the max- for metrics associated with each axis (i.e., MC, EC, HI), and
imum performance of a system in a given mission at a an additional FIS which combines these three outputs into a
given autonomy level. Uniquely, this method includes both single score. This structure representing a cascaded FIS (cFIS)
non-contextual autonomy metrics, and contextual autonomy is illustrated in Fig. 1. For each test, the outcome of the FIS
metrics, and provides a single output prediction based on both for all three axis is fed into a combining FIS that produces a
types of data. final autonomy score. Each FIS in our cFIS is a Sugeno-type
One of the drawbacks of MPP is that it only provides FIS with multiple inputs and one output. For each input of an
a prediction of the performance of a system at a specified FIS, we consider three membership functions (MFs) labeled
autonomy level for a specified mission. In other words, this as low, medium, and high. Without loss of generality, we used
does not evaluate how a real system performs, but rather the triangular MFs, however, other types of MF can be used. The
maximum potential for a system to perform. Our approach input variables used in different tests and their corresponding
instead calculated the actual autonomy of a system based on MF parameters have been reported in Table II. The output of
actual data from real-world experiments. each Sugeno-type FIS has five singleton MFs (i.e., constant):
very bad, bad, medium, good, very good. Our FIS’ use a
triangular fuzzifier and a Sugeno defuzzifier (i.e., weighted
average output). For each FIS, we defined a rule base (i.e., a
set of linguistic rules).
In the cFIS structure in Fig. 1, the defuzzified output of
each FIS is a value in the range of [0, 1]. For the initial three
FIS, 0 and 1 represent the lowest and highest complexity,
respectively. In the case of the final FIS, 0 and 1 represent the
lowest and the highest autonomy, respectively. If we define the
singleton value of each output function as zi , and the degree
Fig. 1: Our cascaded Fuzzy Inference System used for calcu- to which each output is weighted based upon the ruleset as
lating a contextual autonomy score for a performed task. wi , then the output final score can be calculated as follows:
PN
wi zi
s = Pi=1N
(1)
III. F RAMEWORK i=1 wi
ALFUS’ summary model works with a set of metrics for where N represents the number of rules in the rule base.
each of its three axes, as well as a system of levels from Table I reports an example of the fuzzy ruleset we used. The
0 to 10. These levels are based upon possible answers from advantage of this system is that we can utilize many different
A. Runtime Endurance
This family of tests focuses on the battery life of the
system in various operational profiles. As shown in Fig. 3a,
the specific test we use from this group focuses on the
system flying continuously in a figure-8 pattern. The main
performance metric for the test is the test duration.
B. Navigation
We have designed two main types of navigation tests, each
with several profiles defined based on the type of movement
(horizontal, vertical, or both) and the type of confinement
Fig. 2: From left to right, top to bottom: Cleo Robotics Dronut,
Flyability Elios 2, Lumenier Nighthawk 3, Parrot ANAFI USA GOV, (horizontal, vertical, or both). As shown in Fig. 3b, navigation
Skydio X2D, Teal Drones Golden Eagle, Vantage Robotics Vesper through confined spaces involves traversal into and out of a
continuously confined space, with tests for hallway, tunnel,
stairwell, and shaft. Navigation through apertures involves
types of data, and clearly define the ranges for each value, transient traversal through an opening, with tests for doorway
allowing the pilots performing the tests to provide feedback and window. Each navigation environment is characterized
on the membership functions and rulesets. according to the dimensions of the confined space or aperture,
lighting, surface textures, and the presence of obstructions on
Mission Complexity Axis
Low Medium High either side of the confined space or aperture. The main metrics
Environment Low Very Bad Bad Medium of performance are efficacy and average navigation time.
Complexity Medium Bad Medium Good
Axis High Medium Good Very good C. Room Clearing
TABLE I: Fuzzy Ruleset utilized in our final combinational FIS In this test method, the system performs a visual inspection
of an example room whose walls, floor, and ceiling are out-
fitted with visual acuity targets which contain nested Landolt
IV. UAS P LATFORMS C symbols of decreasing size. As shown in Fig. 3c, the test
Fig. 2 illustrates seven sUAS platforms evaluated in our ex- was performed under two conditions: with and without using
periments. The platforms include: the Cleo Robotics Dronut1 , camera zoom. The main performance metrics are duration,
Flyability Elios 22 , Lumenier Nighthawk 33 , Parrot ANAFI coverage, and average acuity.
USA GOV4 , Skydio X2D5 , Teal Drones Golden Eagle6 , and D. Takeoff and Land/Perch
Vantage Robotics Vesper7 . These platforms provide a wide
ranging set of capabilities and use cases. For instance, Parrot, As illustrated in Fig. 3d, these tests evaluate the system’s
Skydio X2D, Golden Eagle, and Vesper were developed for ability to takeoff and land or perch in various environments
outdoor reconnaissance, whereas the Dronut and Elios 2 were that may be affected by stabilization issues or preventative
developed for indoor reconnaissance and inspection, specif- safety checks from the system. The conditions tested vary the
ically in urban and industrial environments. Previously, we angle of the ground plane (flat, 5° and 10° pitch and roll)
have used the same set of sUAS for a non-contextual bench- and the presence of obstructions (1.2-2.4m overhead, 0.6-1.2m
marking [4], [10]. In our evaluations, we have anonymized the lateral). The main performance metric is efficacy.
data by assigning the platforms labels A through G without VI. R ESULTS
any specific ordering or correlation.
Utilizing our framework outlined in Section III, we calculate
V. T EST D ESIGN a performance score for each sUAS based upon the conditions
and performance metrics detailed below. As mentioned above,
To evaluate the contextual autonomy of our platforms, we
our system provides a single score for each of the three
have designed several tests across a spectrum of areas. The
attributes, the EC, MC and HI axes, and utilizes those scores
variables for which we collected data for each test is reported
to provide a single score for the entire test. It should be noted
in Table II. In this section, we describe each test briefly.
that although we consider all three axes in our cFIS structure,
As shown in Fig. 3 all tests have been designed for indoor
due to lack of data, we consider the lowest level (i.e., full
environments.
tele-operation) for the HI axis across all tests. The test-specific
1 https://cleorobotics.com/ details of the structure is discussed in corresponding sections.
2 https://www.flyability.com/elios-2
Another factor to note is that for some of these experiments,
3 https://www.lumenier.com/
4 https://www.parrot.com/us/drones/anafi
some sUAS were not available at the time of testing, due to the
5 https://www.skydio.com/skydio-x2 sUAS being repaired, or other circumstances. In this case, we
6 https://tealdrones.com/suas-golden-eagle/ attempt to remedy this by calculating a partial point achieved
7 https://vantagerobotics.com/vesper/ by the sUAS. For situations where data for an entire test is
(a) Runtime Endurance test showing a system performing a specified (c) Room clearing test showing a system inspecting surfaces for
movement visual acuity targets.
(b) Navigation tests (left to right, top to bottom): hallway, tunnel, (d) Takeoff and land/perch tests showing variations in ground plane
stairwell, shaft, door, window. angle (top row) and nearby obstructions (bottom row).
Fig. 3: Tests designed for the evaluation of sUAS contextual autonomy.
TABLE II: Membership Functions (MFs) for each input and output variable used in an FIS in the evaluation of these sUAS.
Predictive
crashes, and completion percentage. As shown above in Fig. 5, UAS T.C. T.A. Takeoff Land R.E. R.C. Score
each sUAS that performed the through apertures test, achieved A 1.0 1.0 1.0 1.0 0.5 0.76 0.85
a maximum score, besides sUAS G, which performed slightly B 0.90 1.0 0.71 0.87 0.76 0.73 0.82
worse, due to both issues in correctly traversing the aperture, C 0.84 1.0 1.0 0.87 - - 0.92
D - - 0.75 0.97 0.65 0.75 0.77
as well as being the only sUAS to suffer a crash during the test. E 0.80 1.0 0.82 0.89 - 0.85 0.87
Next, for the through corridors test, As shown in Fig. 5, there F - - 0.99 0.91 - - 0.95
is more variance in the performance between each sUAS, with G 0.83 0.83 1.0 1.0 0.5 0.79 0.80
UAS A performing the best, and UAS E performing the worst,
TABLE III: Scores of each sUAS for each test, as well as a weighted
even though sUAS E tied for best in through apertures. This multiple which allows for an overall evaluation of each sUAS
is important, as there is more room for error while traversing
corridors, than there is traversing an aperture. Contextual autonomy evaluations are concerned with the
Takeoff and Land/Perch: For the takeoff and land/perch performance of a system within an environment while per-
test, the cFIS diagram can be found in Fig. 6e. The inputs forming a specific task with a known level of complexity.
include pitch, yaw, number of crashes, completion percentage, However, calculating an overall score that represents an av-
vertical and lateral obstruction, and number of rollovers. As erage autonomy for a given system in a spectrum of tests
can be seen in Fig. 5 and Table III, sUAS A and sUAS G and environment is desirable. To combine the test scores into
perform best across both sections of the test and thus provide a single score, we utilize a weighted product, with equal
the highest level of autonomy. Likewise, sUAS B performs the weightings for each test, as we did previously in our non-
worst in both portions of the test, showcasing a lower level of contextual evaluation [4]. The weighted product represented
autonomy, compared to the other sUAS. This evaluation allows as
for the characterization of how an sUAS may perform during YM
portions of a mission which requires the system to takeoff or P = φwi ,
i
(2)
land in a specified spot, of varying difficulty. i
Room Clearing: Since the room clearing test is done in where M is the number of individual tests, φi represents an
a static environment, we included a time constraint in our individual test score, and wi the weight assigned to that test.
environment.
VII. C ONCLUSIONS AND D ISCUSSIONS
In this paper, we proposed a framework for evaluation of
contextual autonomy for robotic systems. Our framework con-
sists of a cascaded Fuzzy Inference System (cFIS) that com-
bines test results over three axes of evaluation (mission com-
(a) Runtime Endurance (R.E.) test FIS
plexity, environment complexity and human independence)
introduced by the ALFUS framework. We have designed
four tests with different mission complexity and environment
complexity levels and performed several experiments with
several sUAS, and we have shown that our modular framework
is adaptable to different tests. For future work, we plan to
extend our framework for performance evaluation.
(b) Through Apertures (T.A.) test FIS To achieve this, a desired mission can be decomposed into
base tasks, such as takeoff/landing, traversing through environ-
ments/apertures, clearing rooms, and general maneuverability.
The user then can define a set of weights and calculate a
potential performance score of a sUAS for the target mission.
Unlike MPP [5], however, we suggest a method which is based
upon performance in set tasks, rather than a combination of
non-contextual attributes, and environmental factors.
(c) Room Clearing (R.C.) test FIS ACKNOWLEDGMENT
This work is sponsored by the Department of the Army, U.S.
Army Combat Capabilities Development Command Soldier
Center, award number W911QY-18-2-0006. Approved for
public release #PR2022 88282
R EFERENCES
[1] P. J. Durst and W. Gray, “Levels of autonomy and autonomous system
(d) Through Corridors (T.C.) test FIS
performance assessment for intelligent unmanned systems,” Engineer-
ing research and development center Vicksburg Ms Geotechnical and
Structures lab, Tech. Rep., 2014.
[2] B. T. Clought, “Metrics, schmetrics! how the heck do you determine
a uav’s autonomy anyway?” Air Force Research Laboratory, Wright-
Pattterson Air Force Base, OH, Tech. Rep., August 2002.
[3] P. J. Durst, W. Gray, and M. Trentini, “A non-contextual model for
evaluating the autonomy level of intelligent unmanned ground vehicles,”
in Proceedings of the 2011 Ground Vehicle Systems Engineering and
Technology Symposium, 2011.
[4] B. Hertel, R. Donald, C. Dumas, and S. R. Ahmadzadeh, “Methods
(e) Takeoff and Land/Perch test FIS for combining and representing non-contextual autonomy scores for
unmanned aerial systems,” International Conference on Automation,
Fig. 6: Diagrams of each system of cascaded FIS utilized to Robotics, and Applications (ICARA), vol. 8th, pp. 145–149, 2022.
calculate scores for each test [5] P. Durst, W. Gray, A. Nikitenko, J. Caetano, M. Trentini, and R. King,
“A framework for predicting the mission-specific performance of au-
tonomous unmanned systems,” International Conference on Intelligent
Robots and Systems, pp. 1962–1969, 2014.
has several benefits including that different test results can [6] T. B. Sheridan, “Automation, authority, and angst – revisited,” vol. 35,
September 1991, pp. 2–6.
be combined without requiring normalization or scaling. The [7] R. Parasuraman, T. B. Sheridan, and C. D. Wickens, “A model for types
results for each sUAS are shown in Table III. It is important to and levels of human interaction with automation,” IEEE Transactions on
note that many of the sUAS perform better than the others in Systems, Man, and Cybernetics–Part A: Systems and Humans, vol. 30,
pp. 286–297, May 2000.
some tests, but worse in others. One example of this is UAS [8] F. Kendoul, “Towards a unified framework for uas autonomy and
A, which has the fourth highest weighted multiple (overall technology readiness assessment (ATRA),” pp. 55–71, 2 2013.
score), while performing the best in four out of six tests. This [9] H.-M. Huang, “Autonomy levels for unmanned systems framework;
volume II: Framework models,” NIST Special Publication: Gaithersburg,
is due to the sUAS’s relatively poor performance in both the MD, USA, p. 30, 2007.
runtime endurance test, as well as in the room clearing test. [10] A. Norton, R. Ahmadzadeh, K. Jerath, P. Robinette, J. Weitzen, T. Wick-
ramarathne, H. Yanco, M. Choi, R. Donald, B. Donoghue et al.,
The use case of a singular score like this presents itself when “Decisive test methods handbook: Test methods for evaluating suas in
a user would like to know which sUAS is likely to provide subterranean and constrained indoor environments, version 1.1,” arXiv
the most overall autonomy, across multiple tests and different preprint arXiv:2211.01801, 2022.