Professional Documents
Culture Documents
(Download PDF) Formal Modeling and Analysis of Timed Systems 18Th International Conference Formats 2020 Vienna Austria September 1 3 2020 Proceedings Nathalie Bertrand Online Ebook All Chapter PDF
(Download PDF) Formal Modeling and Analysis of Timed Systems 18Th International Conference Formats 2020 Vienna Austria September 1 3 2020 Proceedings Nathalie Bertrand Online Ebook All Chapter PDF
(Download PDF) Formal Modeling and Analysis of Timed Systems 18Th International Conference Formats 2020 Vienna Austria September 1 3 2020 Proceedings Nathalie Bertrand Online Ebook All Chapter PDF
https://textbookfull.com/product/quantitative-evaluation-of-
systems-17th-international-conference-qest-2020-vienna-austria-
august-31-september-3-2020-proceedings-marco-gribaudo/
https://textbookfull.com/product/formal-modeling-and-analysis-of-
timed-systems-12th-international-conference-
formats-2014-florence-italy-september-8-10-2014-proceedings-1st-
edition-axel-legay/
https://textbookfull.com/product/formal-modeling-and-analysis-of-
timed-systems-17th-international-conference-
formats-2019-amsterdam-the-netherlands-
august-27-29-2019-proceedings-etienne-andre/
https://textbookfull.com/product/computational-methods-in-
systems-biology-18th-international-conference-cmsb-2020-konstanz-
germany-september-23-25-2020-proceedings-alessandro-abate/
https://textbookfull.com/product/foundations-of-intelligent-
systems-25th-international-symposium-ismis-2020-graz-austria-
september-23-25-2020-proceedings-denis-helic/
https://textbookfull.com/product/business-process-
management-18th-international-conference-bpm-2020-seville-spain-
september-13-18-2020-proceedings-dirk-fahland/
Nathalie Bertrand
Nils Jansen (Eds.)
Formal Modeling
LNCS 12288
and Analysis
of Timed Systems
18th International Conference, FORMATS 2020
Vienna, Austria, September 1–3, 2020
Proceedings
Lecture Notes in Computer Science 12288
Founding Editors
Gerhard Goos
Karlsruhe Institute of Technology, Karlsruhe, Germany
Juris Hartmanis
Cornell University, Ithaca, NY, USA
Formal Modeling
and Analysis
of Timed Systems
18th International Conference, FORMATS 2020
Vienna, Austria, September 1–3, 2020
Proceedings
123
Editors
Nathalie Bertrand Nils Jansen
Inria Radboud University
Université de Rennes Nijmegen, The Netherlands
Rennes, France
This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
In this current book, we present the proceedings of the 18th International Conference
on Formal Modeling and Analysis of Timed Systems (FORMATS 2020). This edition
of FORMATS was a special one. Amid a global crisis, we as a community strived to
maintain a healthy and safe research environment. Early on, after many countries
started shutting down most of public life as a reaction to the novel coronavirus, we
decided that QONFEST 2020, and with it FORMATS, CONCUR, FMICS, and QEST,
would not take place in Vienna but as a virtual event.
Despite the critical and difficult times, we are very happy to present a strong and
interesting program for FORMATS 2020. In total, we received 36 submissions, and
each one was reviewed by at least 3 Program Committee (PC) members. The com-
mittee, consisting of 28 members, decided to accept 16 very strong papers for which
the decisions where mostly unanimous.
FORMATS 2020 targeted timing aspects of systems from the viewpoint of different
communities within computer science and control theory. In particular, the aim of
FORMATS 2020 was to promote the study of fundamental and practical aspects of
timed systems, and be more inclusive to researchers from different disciplines that share
interests in modeling and analysis of timed systems. As an example, FORMATS
encouraged submissions related to foundations and semantics, methods and tools, as
well as techniques, algorithms, data structures, and software tools for analyzing timed
systems and resolving temporal constraints.
Moreover, we strongly encouraged contributions dedicated to applications. To
further foster emerging topics, we organized two special sessions:
Data-Driven Methods for Timed Systems: This session concerned all kind of
data-driven methods such as machine learning or automata learning that consider
timing aspects. Examples are automata learning for timed automata or reinforcement
learning with timing constraints. The session was chaired by Guillermo Alberto Perez.
Probabilistic and Timed Systems: Real-time systems often encompass probabilistic
or random behavior. We are interested in all approaches to model or analyze such
systems, for instance through probabilistic timed automata or stochastic timed Petri
nets. This session was chaired by Arnd Hartmanns.
In addition to the regular talks for the accepted papers, FORMATS 2020 featured an
excellent invited talk by Alessandro Abate, together with an overview paper that is part
of these proceedings. Moreover, invited talks by Roderick Bloem and Annabelle
McIver were organized jointly with the other conferences within QONFEST. Further
details may be found at https://formats-2020.cs.ru.nl/index.html.
Finally, some acknowledgments are due. First, we want to thank Ezio Bartocci for
the great organization of QONFEST 2020. Second, thanks to the Steering Committee
of FORMATS 2020, and notably to Martin Fränzle, for their support, to all the PC
vi Preface
members and additional reviewers for their work in ensuring the quality of the con-
tributions to FORMATS 2020, and to all the authors and participants for contributing
to this event. Finally, we thank Springer for hosting the FORMATS proceedings in its
Lecture Notes in Computer Science series, and to EasyChair for providing a convenient
platform for coordinating the paper submission and evaluation.
It was an honor for us to serve as PC chairs of FORMATS 2020, and we hope that
this edition of the conference will inspire the research community to many further ideas
and direction in the realm of timed systems.
Steering Committee
Program Committee
Mohamadreza Ahmadi California Institute of Technology, USA
C. Aiswarya Chennai Mathematical Institute, India
Nicolas Basset Université Grenoble Alpes, France
Nathalie Bertrand Inria, France
(PC Co-chair)
Anne Bouillard Huawei Technologies, France
Patricia Bouyer CNRS, France
Milan Ceska Brno University of Technology, Czech Republic
Rayna Dimitrova The University of Sheffield, UK
Uli Fahrenberg École Polytechnique, France
Gilles Geeraerts Université libre de Bruxelles, Belgium
Arnd Hartmanns University of Twente, The Netherlands
Frédéric Herbreteau University of Bordeaux, France
Laura Humphrey Air Force Research Laboratory, USA
Nils Jansen (PC Co-chair) Radboud University, The Netherlands
Sebastian Junges University of California, Berkeley, USA
Gethin Norman University of Glasgow, UK
Marco Paolieri University of Southern, California, USA
Guillermo Perez University of Antwerp, Belgium
Hasan Poonawala University of Kentucky, USA
Krishna S. IIT Bombay, India
Ocan Sankur CNRS, France
Ana Sokolova University of Salzburg, Austria
Jiri Srba Aalborg University, Denmark
B. Srivathsan Chennai Mathematical Institute, India
viii Organization
External Reviewers
1 Introduction
Deep Reinforcement Learning (RL) is an emerging paradigm for autonomous
decision-making tasks in complex and unknown environments. Deep RL has
achieved impressive results over the past few years, but often the learned solu-
tion is evaluated only by statistical testing and there is no systematic method
to guarantee that the policy synthesised using RL meets the expectations of
the designer of the algorithm. This particularly becomes a pressing issue when
applying RL to safety-critical systems.
Furthermore, tasks featuring extremely sparse rewards are often difficult to
solve by deep RL if exploration is limited to low-level primitive action selec-
tion. Despite its generality, deep RL is not a natural representation for how
c Springer Nature Switzerland AG 2020
N. Bertrand and N. Jansen (Eds.): FORMATS 2020, LNCS 12288, pp. 1–22, 2020.
https://doi.org/10.1007/978-3-030-57628-8_1
2 M. Hasanbeig et al.
humans perceive sparse reward problems: humans already have prior knowledge
and associations regarding elements and their corresponding function in a given
environment, e.g. “keys open doors” in video games. Given useful domain knowl-
edge and associations, a human expert can find solutions to problems involving
sparse rewards, e.g. when the rewards are only received when a task is eventually
fulfilled (e.g., finally unlocking a door). These assumed high-level associations
can provide initial knowledge about the problem, whether in abstract video
games or in numerous real world applications, to efficiently find the global opti-
mal policies, while avoiding an exhaustive unnecessary exploration, particularly
in early stages.
The idea of separating the learning process into two (or more) synchronised
low- and high-level learning steps has led to hierarchical RL, which specifically
targets sparse reward problems [49]. Practical approaches in hierarchical RL,
e.g. options [43], depend on state representations and on the underlying problem
simplicity and/or structure, such that suitable reward signals can be effectively
engineered by hand. These methods often require detailed supervision in the
form of explicitly specified high-level actions or intermediate supervisory signals
[2,9,29,30,43,52]. Furthermore, most hierarchical RL approaches either only
work in discrete domains, or require pre-trained low-level controllers. HAC [32],
a state-of-the-art method in hierarchical RL for continuous-state/action Markov
Decision Processes (MDPs), introduces the notion of Universal MDPs, which
have an augmented state space that is obtained by a set of strictly sequential
goals.
This contribution extends our earlier work [19,20,22,57] and proposes a one-
shot1 and online deep RL framework, where the learner is presented with a modu-
lar high-level mission task over a continuous-state/action MDP. Unlike hierarchi-
cal RL, the mission task is not limited to sequential tasks, and is instead specified
as a Linear Temporal Logic (LTL) formula, namely a formal, un-grounded, and
high-level representation of the task and of its components. Without requiring
any supervision, each component of the LTL property systematically structures a
complex mission task into partial task “modules”. The LTL property essentially
acts as a high-level exploration guide for the agent, where low-level planning
is handled by a deep RL architecture. LTL is a temporal logic that allows to
formally express tasks and constraints in interpretable form: there exists a sub-
stantial body of research on extraction of LTL properties from natural languages
[16,38,55].
We synchronise the high-level LTL task with the agent/environment: we first
convert the LTL property to an automaton, namely a finite-state machine accept-
ing sequences of symbols [3]; we then construct on-the-fly2 a synchronous prod-
uct between the automaton and the agent/environment; we also define a reward
function based on the structure of the automaton. With this automated reward-
1
One-shot means that there is no need to master easy tasks first, then compose them
together to accomplish a more complex tasks.
2
On-the-fly means that the algorithm tracks (or executes) the state of an underlying
structure (or a function) without explicitly constructing it.
Deep Reinforcement Learning with Temporal Logics 3
shaping procedure, an RL agent learns to accomplish the LTL task with max
probability, and with no supervisory assistance: this is in general hard, if at all
possible, by conventional or handcrafted RL reward shaping methods [43,49,52].
Furthermore, as elaborated later, the structure of the product partitions the state
space of the MDP, so that partitions are solely relevant to specific task modules.
Thus, when dealing with sparse-reward problems, the agent’s low-level explo-
ration is efficiently guided by task modules, saving the agent from exhaustively
searching through the whole state space.
Contributions. To tackle the discussed issues and push the envelope of state
of the art in RL, in this work and propose a modular Deep Deterministic Pol-
icy Gradient (DDPG) based on [34,47]. This modular DDPG is an actor-critic
architecture that uses deep function approximators, which can learn policies in
4 M. Hasanbeig et al.
continuous state and action spaces, optimising over task-specific LTL satisfaction
probabilities. The contributions of this work are as follows:
2 Problem Framework
where Eπ [·] denotes the expected value given that the agent follows policy π,
γ ∈ [0, 1) (γ ∈ [0, 1] when episodic) is a discount factor, R : S × A → R is the
reward, and s0 , a0 , s1 , a1 , . . . is the sequence of state-action pairs generated by
policy π.
It has been shown that constant discount factors might yield sub-optimal
policies [7,17]. In general the discount factor γ is a hyper-parameter that has
to be tuned. There is standard work in RL on state-dependent discount factors
[37,40,54,56], that is shown to preserve convergence and optimality guarantees.
A possible tuning strategy to resolve the issues of constant discounting is as
follows:
η if R(s, a) > 0,
γ(s) = (1)
1 otherwise,
where 0 < η < 1 is a constant. Hence, Definition 3 is reduced to [37]:
∞
U π (s) = Eπ [ γ(sn )N (sn ) R(sn , π(sn ))|s0 = s], 0 ≤ γ ≤ 1, (2)
n=0
where N (sn ) is the number of times a positive reward has been observed at sn .
The function U π (s) is often referred to as value function (under the policy π).
Another crucial notion in RL is action-value function Qπ (s, a), which describes
the expected discounted return after taking an action a in state s and thereafter
following policy π:
∞
Qπ (s, a) = Eπ [ γ n R(sn , an )|s0 = s, a0 = a].
n=1
Accordingly, the recursive form of the action-value function can be obtained as:
Qπ (s, a) = R(s, a) + γQπ (s1 , a1 ), (3)
where a1 = π(s1 ). Q-learning (QL) [53] is the most extensively used model-free
RL algorithm built upon (3), for MDPs with finite-state and finite-action spaces.
For all state-action pairs QL initializes a Q-function Qβ (s, a) with an arbitrary
finite value, where β is an arbitrary stochastic policy.
Qβ (s, a) ← Qβ (s, a) + μ[R(s, a) + γ max
(Qβ (s , a )) − Qβ (s, a)]. (4)
a ∈A
Discussion. In all the experiments, the modular DDPG algorithm has been
able to automatically modularise the given LTL task and to synthesise a suc-
cessful policy. We have employed stand-alone DDPG as a baseline for com-
parison. In the cart-pole example, without synchronising the LDBA with the
actor/environment, stand-alone DDPG cannot learn a policy for the non-
Markovian task in (12). Hierarchical RL methods, e.g. [32], are able to generate
goal-oriented policies, but only when the mission task is sequential and not par-
ticularly complex: as such, they would not be useful for (12). Furthermore, in
state-of-the-art hierarchical RL there are a number of extra hyper-parameters
to tune, such as the sub-goal horizons and the number of hierarchical levels,
which conversely the one-shot modular DDPG does not have. The task in (11)
Fig. 8. (a) Path generated by the policy learnt via modular DDPG around the Victoria
crater, vs (b) the actual Mars rover traverse map [48].
18 M. Hasanbeig et al.
Table 1. Success rates: statistics are taken over at least 200 trials
is chosen to elucidate how the modular DDPG algorithm can handle cases in
which the generated LDBA has non-deterministic ε-transitions.
In the Mars rover examples, the reward function for the stand-alone DDPG is
rp if the agent visits all the checkpoints in proper order, and rn if the agent enters
regions with unsafe labels. However, the performance of stand-alone DDPG has
been quite poor, due to inefficient navigation. In relatively simple tasks, e.g. the
Melas Chasma experiment, stand-alone DDPG has achieved a positive success
rate, however at the cost of very high sample complexity in comparison to Mod-
ular DDPG. Specifically, due to its modular structure, the proposed architecture
requires fewer samples to achieve the same success rate. Each module encom-
passes a pair of local actor-critic networks, which are trained towards their own
objective and, as discussed before, only samples relevant to the sub-task are
fed to the networks. On the other hand, in standard DDPG the whole sample
set is fed into a large-scale pair of actor-critic networks, which reduces sample
efficiency.
5 Conclusions
We have discussed a deep RL scheme for continuous-state/action decision mak-
ing problems under LTL specifications. The synchronisation of the automaton
expressing the LTL formula with deep RL automatically modularises a global
complex task into easier sub-tasks. This setup assists the agent to find an opti-
mal policy with a one-shot learning scheme. The high-level relations between
sub-tasks become crucial when dealing with sparse reward problems, as the
agent exploration is efficiently guided by the task modules, saving the agent
from exhaustively exploring the whole state space, and thus improving sample
efficiency.
Acknowledgements. The authors would like to thank Lim Zun Yuan for valuable
discussions and technical support, and to anonymous reviewers for feedback on previous
drafts of this manuscript.
Another random document with
no related content on Scribd:
kantaa. Talonpoikaistaloja, pappiloita ja pikku viljelijäin asuntoja,
valkoisia, tilavia ristikkorakennuksia vähäisimmälläkin pieneläjällä,
maalaiskirkkoja ja pitkiä rivejä hedelmäpuita teitten varsilla. Koko
maa puutarhana vainioihin ruudutettuna, ja vainioitten välillä kukkais-
aidat.
*****
17.
— Minä myöskin!
Äiti oli ihmeissään siitä, että isä, joka muuten oli niin varovainen ja
harkitseva, nyt otti asian välittömämmin ja huolettomammin kuin hän
itse.
Niin tietysti. Yrjön mielipide merkitsi tässä kohden yhtä paljon kuin
heidän omansakin.
— Mutta muistatko, Yrjö, että nyt omistat kaikki yksin, vaan jos
hän tulee, saisi hänkin osansa?
— Niin, se ei merkitse yhtään mitään. Ajatelkaa mikä suuri
tapahtuma tämä sentään on! Heti menisin häntä vastaan.
Parin viikon perästä tuli sitten se pieni tyttö, oman maan lapsi! Ja
minkälainen hahtuva hän oli! Minkälainen utukarvainen untuva! Hän
oli miltei poispuhallettavissa — eikä juuri minkään näköinen.
Hiljainen ja ihmettelevä, arka ja kiitollinen. Hän janosi rakkautta ja
suli onnesta, kun hän sitä sai, ja oli ensi hetkestä niin herkkä
vaikutelmille, että hän oli ikäänkuin ympäristönsä hypnotisoima.
Yrjön entinen vuode oli tuotu alas vinniltä. Ja nyt lepäilivät Kirstin
valkoiset palmikot samoilla pieluksilla kuin kerran Yrjön
tasatukkainen pää. Liinakaapista oli otettu esille reikäompeleiset
lakanat ja tyynynpäälliset, isänäidin ompelemat, joille isä aikoinaan
oli laskettu hänen päivän valon nähtyänsä. Ne oli Yrjön vuoteeseen
laskettu ensimmäisen yön tervetuliaisiksi, ja nyt ne ottivat vastaan
pienen tyttölapsen. Iltasuutelon jälkeen hän heti nukkui kuin
vihdoinkin pesän löydettyään. Ja mummo oli tyttöä sylissään
tuuditellut ja matkan jälkeen viihdytellyt, kunnes se laskettiin omaan
pehmoiseen, lumivalkoiseen vuoteeseensa.
18.
Eräänä päivänä äiti luki Yrjölle kirjeen, jonka hän oli kirjoittanut ja
josta hän tahtoi tietää pojan mielipidettä. Yrjö teki silloin
huomautuksen jostakin lausunnasta, jonka hän ehdotti
muutettavaksi.
Isä ja äiti olivat poissa, ja mummo oli pannut ovensa kiinni, eivätkä
he uskaltaneet kertoa heille tuosta hurmaavasta Afrikanmatkasta
ennenkuin aikoja myöhemmin. Sillä silloin olisi heidät ehkä paha
perinyt, ja mitäs jos olisivat sairastuneet, niin olisi sanottu, että oma
syy!
Yrjö oli ainoa, joka heti sai siitä tietää. He juoksivat suoraa päätä
kuvaamaan hänelle seikkailuaan. Eivätkä he välittäneet vähääkään
siitä, että hän nosti sormensa ja oli olevinaan suuri. Sillä hän
ymmärsi heitä kuitenkin mainiosti.
Välistä näki hän ikkunastaan, miten Kirsti, Sirkka ja Eija juosta
lipittivät tien yli Lehtolaan. Silloin hän hymähti itsekseen, hyvin
tietäen, että heillä oli jotakin erikoisen hupaista mielessään.
Aina kun Kirsti tuli sinne tai kohtasi hänet tiellä, pani hän liikkeelle
parhaimman suomenkielensä ja sanoi aina samalla tavalla:
— Miten niin?
— Hän on koulussa vakava ja sulkeutunut — melkeinpä hiukan
jörö, ikäänkuin hän ei siellä erikoisesti viihtyisi. Mutta kotona hän
aina on loistavalla tuulella.
Koko tädin koti oli välistä yhtenä mylläkkänä, kun oli suuri ohjelma-
ilta ja talo lapsia täynnänsä. He täyttivät yläkerran ja alakerran ja
juoksivat portaissa ja muuttivat kuumeisella kiireellä pukuja tädin
omassa huoneessa. Ja se oli kaikki niin kauhean jännittävää, sillä
tädin tuttavat tulivat sitä sitten katsomaan.
Tietysti tahdottiin.