Download as pdf or txt
Download as pdf or txt
You are on page 1of 186

UNIVERSIDAD DE BUENOS AIRES

Facultad de Ciencias Exactas y Naturales


Departamento de Computación

Modelado y Simulación Híbrida de Redes Complejas de Datos

Tesis presentada para optar al título de Doctor de la Universidad de Buenos Aires


en el área Ciencias de la Computación

Matías Alejandro Bonaventura

Director de Tesis: Dr. Rodrigo Castro


Consejero de estudios: Dr. Agustín Gravano

Lugar de trabajo: Laboratorio de Simulación de Eventos Discretos (SED), Departamento de


Computación (DC) e Instituto de Ciencias de la Computación (ICC-CONICET), Facultad de
Ciencias Exactas y Naturales (FCEyN), Universidad de Buenos Aires (UBA).
Fecha de defensa: 29 de Marzo 2019

Buenos Aires, Marzo de 2019


Hybrid Modeling and Simulation of
Complex Data Networks
Abstract

This Thesis develops new simulation techniques that combine packet-level and fluid-flow mod-
eling approaches for the study of dynamics in data networks. Novel practical and theoretical
tools for modeling and simulation are introduced to integrate seamlessly these two very disparate
approaches under a unifying hybrid formal framework.
The Trigger and Data AcQuisition (TDAQ) network in the ATLAS particle detector at CERN
is adopted as a large complex real world case study to validate the obtained simulation models.
The latter supported decisions for design, planning and tuning in network engineering projects
aiming at upgrades planned for 2021 and 2027.
Packet-level simulations yield fine-grained results comparable to real events in data networks.
Yet, the underlying complexity makes the approach unsuitable for complex high-speed networks,
as the required simulation times scale at least linearly with the size or aggregate throughput of
the system.
Meanwhile, a fluid-flow approach reduces model complexity by relying on fluid approximations
with Ordinary Differential Equations. This strategy yields faster simulations typically insensitive
to aggregate system throughput, at the cost of capturing only averaged coarse-grained network
dynamics.
Each approach requires substantially different background and tools, making network experts
to adhere to either of them. This usually leads to a diversification of simulation algorithms, models
and type of analyses, hindering their integration.
In this Thesis, new techniques are developed allowing for the coexistence and interaction of
packet-level and fluid-flow models under the Discrete EVent Systems specification (DEVS) for-
malism, helping to reduce the gap between both approaches. We show how hybrid models are
able to retain the performance advantages of fluid-flow models while providing detailed simulation
traces of packet-level models for selected flows. The latter is achieved under formal guarantees
convergence of the underlying numerical integration methods.
In particular, the Quantized State Systems (QSS) family of numerical methods was extended to
support the solution of Retarded Functional Differential Equations with implicit delays, a required
theoretical tool to describe the macroscopic dynamics of network protocols under closed-loop con-
trol.
The overall main outcome is a new generic and reusable library of models that enable the
hybrid (discrete/continuous) study of data networks. Network experts can flexibly choose the
desired simulation granularity, while retaining a unified and intuitive modeling experience centered
in the modular and hierarchical definition of topologies and the setting of properties for network
components.

Keywords: Modeling, Simulation, Data Networks, Hybrid Models

2
Modelado y Simulación Híbrida de Redes
Complejas de Datos
Resumen

Esta Tesis desarrolla nuevas técnicas que combinan los enfoques de modelado paquete a paquete
y aproximaciones fluidas para el estudio de dinámicas en redes de datos. Se introducen novedosas
herramientas teórico-prácticas de modelado y simulación para combinar indistintamente estos dos
enfoques dispares bajo un marco híbrido, formal y unificador.
La red de adquisición de datos (DAQ) del experimento ATLAS en CERN se tomó como caso
complejo de estudio real para validar los resultados obtenidos por modelos de simulación. Estos
dieron soporte a decisiones de diseño, planificación de capacidad y actividades de puesta a punto
en proyectos reales de ingeniería de redes a implementarse entre 2021 y 2027.
El enfoque paquete a paquete provee resultados de grano fino cercanos a los observables en redes
reales. Sin embargo, la complejidad representada en los modelos hace a este enfoque inapropiado
para la simulación de redes complejas de alta intensidad, ya que los tiempos de simulación escalan
al menos linealmente con el tamaño o la tasa de transferencia total del sistema.
Por otro lado, el enfoque fluido reduce la complejidad del modelo basándose en aproximaciones
de flujos con ecuaciones diferenciales ordinarias (ODE). Este enfoque resulta en menores tiempos
de simulación generalmente independientes de la tasa de transferencia, pero captura únicamente
dinámicas promedio de grano grueso.
Cada enfoque requiere conocimientos y herramientas sustancialmente diferentes, por lo que
expertos en redes suelen adoptar solo uno de ellos. Esto suele conducir a una diversificación de
algoritmos de simulación y a prácticas de análisis que dificultan la integración de estrategias.
En esta Tesis se desarrollan nuevas técnicas que permiten la coexistencia e interacción de mode-
los paquete a paquete con modelos fluidos bajo el formalismo Discrete EVent Systems specification
(DEVS), ayudando a reducir la brecha entre ambos enfoques. Mostramos cómo los modelos híbri-
dos mantienen las ventajas de modelos fluidos en tiempos de simulación, a la vez que proveen trazas
detalladas de los modelos paquete a paquete. Esto se logra bajo garantías formales de estabilidad
y convergencia en los métodos numéricos de integración subyacentes.
En particular, la familia de métodos de integración numérica por cuantificación de estados
(QSS) fue extendida para la aproximación de ecuaciones diferenciales funcionales con retardos
variables, herramienta necesaria para describir la dinámica macroscópica de protocolos con control
a lazo cerrado.
Como resultado final se obtuvieron nuevas bibliotecas de modelos genéricos y reutilizables
para estudio híbrido (discreto/continuo) de redes de datos. Estos permiten elegir flexiblemente
la granularidad deseada para la simulación, manteniendo una experiencia de modelado intuitiva
centrada en la definición modular y jerárquica de la topología de la red y los parámetros de sus
componentes.

Palabras clave: Modelado, Simulación, Redes de Datos, Modelos Híbridos.

3
Agradecimientos

Antes que nada le quería agradecer a Rodrigo, que fue para el doctorado mi director, y para todo
el resto va a ser siempre mi mentor y un gran amigo. Gracias por todo el apoyo con el desarrollo de
la tesis, siempre presente y ayudando desde lo organizativo hasta los mas mínimos detalles técnicos
(¡no es fácil encontrar un director así!). Gracias también por aguantar las ideas y vueltas, por estar
presente con buenos consejos en los momentos difíciles, y hasta fomentar mis planes locos. No me
alcanzan las palabras para agradecer todo lo que Rodri hizo durante estos años. Muchas gracias
también a Matt por su paciencia y al grupo de simulación en la facu: Lucio, Dani, Ale, Eze y
Andy. Gracias por compartir las oficinas (acá y allá) y el trabajo sin el que esta Tesis no hubiera
podido tener los mismos resultados.
Agradezco también a la Universidad de Buenos Aires por la excelente formación que recibí,
a CONICET por la beca de doctorado, y al CERN por los proyectos en colaboración. Gracias
especiales a Giovanna Lehmann Miotto y Wainer Vandelli en CERN por su confianza y por darme
la oportunidad de ser parte de un gran equipo donde aprendí el poder que tiene colaborar con
gente apasionada en su trabajo. Muchas gracias también a todos mis colegas en el CERN: Eukeni,
Fabrice, Giussepe, Jorn, Alejandro y Adam, que fueron importantísimos para avanzar en la Tesis y
mi desarrollo académico. No solo por las tantas discusiones técnicas, sino también por los valores
humanos, humildad, siempre dispuestos a colaborar y buscar mejores soluciones. Por supuesto
tengo que agradecer también al equipo de fútbol del CERN y a Gozo (demasiados para mencionar
de a uno) que fueron claves para hacerme sentir un poco más como en casa y encontrar ese espacio
para distenderme y divertirnos.
Nunca hubiera podido terminar esta Tesis sin el apoyo de mis eternos amigos en Buenos Aires:
Maru, Jinkis, Juancito, Fede, El Negro, Sofi, Zaina, El Blanco, y también sin estar en Argentina
Jony, El Pelado, Rais y Jr. Aunque no me comunique tan seguido cada vez que nos juntamos nos
entendemos como si el tiempo no hubiera pasado. Es hermoso saber que ustedes están siempre ahí
con los brazos abiertos, para compartir opiniones entre birras y asados. Obviamente se extienden
mis agradecimientos a las hermosas familias que formaron con La Brujis, Barbi, Male, Mumi y
Franquito. No paro de maravillarme cada vez que los veo crecer.
En Ginebra también conocí gente hermosa que rápidamente se ganaron mi corazón. Pablo, que
apenas nos conocimos entendimos que íbamos a ser buenos amigos y es ahora como mi hermano
mayor. Gracias a Pablo pasé las noches más divertidas del doctorado y fue quien me mostró el
mundillo de Ginebra a pesar de verse forzado a albergarme tantas veces. A Massimo, mi nuevo
hermano en Ginebra, no puedo dejar de agradecerle por las tantas charlas y salidas divertidas.
Y como no mencionar a las chicas: Vani, Viole, Sandra, Ceci, Stellita, siempre tan divertidas y
compañeras en los momentos buenos y malos. Y con Emi, que me ilumina el día cada vez que
escucho "Tio Mati", terminamos de formar nuestra pequeña familia Argentina. Gracias por todas

4
5

las cenas, charlas, discusiones y tantos momentos que compartimos.


El mayor de mis agradecimientos es para mi mamá y papá, María del Carmen y Roberto. Haber
llegado a esta instancia se lo debo a todos los esfuerzos que ellos hicieron durante tantos años. Mi
papá, que desde chiquito me sembró la semilla de la curiosidad con muchas charlas, preguntas y
respuestas, y fue quien me regaló mi primer computadora. Mi mamá, que le dió siempre prioridad
a la educación de sus hijos por sobre todo, nos enseñó los valores de humildad y a luchar por lo
que uno ama. Ojalá yo pueda ser en el futuro tan buen padre como ellos fueron conmigo.
A Melisa y Martín (¡que liga agradecimiento doble!), mis hermanos, compañeros y amigos.
Gracias por todas las charlas y reflexiones que me ayudaron tantas veces a que la distancia no se
sienta tan lejos. Por más que muchas veces los extrañe y no podamos compartir el día a día, sé
que siempre vamos a estar unidos por el corazón.
Finalmente y por sobre todo, tengo que agradecerle a Ayelén, el amor de mi vida y la mujer
con la que elijo compartir todos los días. Casi responsable directa de haber seguido con la Tesis
de doctorado y autora material de muchas figuras de este documento. Gracias por la paciencia
infinita cuando llegaba agotado y acompañarme siempre en los momentos importantes. Sin su
amor infinito hubiera sido imposible terminar esta Tesis y encarar esta nueva etapa que comienza.

Matías Bonaventura
Contents

Agradecimientos 4

1 Introduction 14
1.1 Main Original Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.2 Supporting Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.3 Overall Related Work and Relevance of Contributions . . . . . . . . . . . . . . . . . 18
1.4 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

Introducción (resumen en castellano) 21

2 Background 22
2.1 Network Simulation Approaches: Packet-Level and Fluid-Flow . . . . . . . . . . . . 22
2.1.1 Network Performance Evaluation Techniques . . . . . . . . . . . . . . . . . . 22
2.1.2 Scaling Limits: From Stochastic Discrete systems to Deterministic Fluid
Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.1.2.1 Zooming Out to Reduce Complexity . . . . . . . . . . . . . . . . . 24
2.1.2.2 On Markov Processes . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.1.3 Network Simulation Approaches . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.2 Hybrid Dynamic Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.3 Classic Integration Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.4 The Discrete Event System Specification (DEVS) Formalism . . . . . . . . . . . . . 29
2.4.1 DEVS Atomic Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.4.2 DEVS Coupled Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.4.3 DEVS Abstract Simulator . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.4.4 Vectorial DEVS Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.5 The Quantized State Systems (QSS) Methods to Solve ODEs in Hybrid Systems . . 33
2.5.1 Properties of QSS Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.5.2 Relationship Between QSS and DEVS . . . . . . . . . . . . . . . . . . . . . 36
2.6 PowerDEVS Simulator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

Antecedentes Preliminares (resumen en castellano) 39

3 Motivating Case Study and Methodology 40


3.1 Case Study Scenario: The ATLAS Data Acquisition Network at CERN . . . . . . . 41
3.1.1 The Large Hadron Collider at CERN . . . . . . . . . . . . . . . . . . . . . . 41
3.1.2 The ATLAS Detector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

6
Contents 7

3.1.3 Trigger and Data Acquisition System . . . . . . . . . . . . . . . . . . . . . . 44


3.2 Modeling and Simulation-Driven Methodology . . . . . . . . . . . . . . . . . . . . . 44
3.2.1 Context and Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.2.2 DEVS-Based Iterative Methodology . . . . . . . . . . . . . . . . . . . . . . . 45
3.2.2.1 Cycles and Phases . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.2.3 Relationship with Existing Techniques and Methods . . . . . . . . . . . . . . 47
3.3 Other Methodological Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . 48

Caso de Estudio y Metodología (resumen en castellano) 49

4 Packet-Level Network Simulation 50


4.1 Introduction and motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.2 Preliminaries and Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.2.1 Relevant Network Congestion Control Mechanisms . . . . . . . . . . . . . . 51
4.2.1.1 Transport Control Protocol (TCP) . . . . . . . . . . . . . . . . . . 51
4.2.1.2 Traffic Shaping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.2.2 Data Network Simulation Tools . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.2.3 Queuing Theory Results for Load-Balancing . . . . . . . . . . . . . . . . . . 57
4.2.4 Applications and Data Flow in the DAQ network . . . . . . . . . . . . . . . 59
4.2.4.1 Other simulation studies of the TDAQ system . . . . . . . . . . . . 59
4.2.4.2 The Future FELIX Network . . . . . . . . . . . . . . . . . . . . . . 60
4.3 Simulation model of the Network layer . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.3.1 Modeling the TCP Congestion Control . . . . . . . . . . . . . . . . . . . . . 62
4.3.2 Preliminary TCP Model Validation Via Comparison With a Network Simulator 63
4.3.3 Modeling Network Topologies . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.3.3.1 Packet Data Structures and Low-Level Network Models . . . . . . 65
4.3.3.2 Topologies and High-Level Network Models . . . . . . . . . . . . . 68
4.3.3.3 Case Study: Creating Larger Topologies . . . . . . . . . . . . . . . 69
4.4 Simulation model of the TDAQ Network Data Flow . . . . . . . . . . . . . . . . . . 72
4.4.1 Relevant Applications in the TDAQ System (HLT Applications) . . . . . . . 72
4.4.2 The DEVS-Based Model for the TDAQ Data Flow . . . . . . . . . . . . . . 75
4.4.3 Model Validation and Application: Reproducing and Studying the TDAQ
System via Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.4.3.1 Traffic-shaping behaviour (application-level) . . . . . . . . . . . . . 78
4.4.3.2 Event Building Latency (Network/Application Level) . . . . . . . 79
4.5 Exploring TDAQ Load Balancing Options Through Modeling and Simulation . . . . 80
4.5.0.1 Load-Balancing in the TDAQ Network . . . . . . . . . . . . . . . . 81
4.5.1 Load-Balancing Model and Studied Strategies . . . . . . . . . . . . . . . . . 82
4.5.1.1 DEVS Model Implementation . . . . . . . . . . . . . . . . . . . . . 84
4.5.2 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
4.5.2.1 ILB policy for Finite Servers . . . . . . . . . . . . . . . . . . . . . . 85
4.5.2.2 Performance Comparison for Different Policies . . . . . . . . . . . . 86
4.5.2.3 Critical Regimes for Efficient Policies . . . . . . . . . . . . . . . . . 86
4.5.2.4 Job Size Sensitivity Analysis . . . . . . . . . . . . . . . . . . . . . 91
4.5.3 Real System Improvement Proposal . . . . . . . . . . . . . . . . . . . . . . . 91
8 Contents

4.5.3.1 Testing the Hypothesis on the Model . . . . . . . . . . . . . . . . . 91


4.5.3.2 Implementation and Validation in the Real System . . . . . . . . . 92
4.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

Simulación Paquete a Paquete (resumen en castellano) 95

5 Fluid-Flow Network Simulation 96


5.1 Introduction and Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
5.2 Preliminaries and Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
5.2.1 Fluid-flow Network Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . 98
5.2.2 Hybrid Network Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
5.2.3 Delay Differential Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
5.2.4 Simulation of Discontinuous Systems . . . . . . . . . . . . . . . . . . . . . . 102
5.3 Modeling a Fluid Buffer-Server System . . . . . . . . . . . . . . . . . . . . . . . . . 106
5.3.1 Mathematical Characterization . . . . . . . . . . . . . . . . . . . . . . . . . 108
5.4 Numerical Solving of Retarded FDEs with Implicit Delays . . . . . . . . . . . . . . 109
5.4.1 Transforming Delay Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . 110
5.4.2 New Forward Delay QSS (FDQSS) . . . . . . . . . . . . . . . . . . . . . . . 112
5.4.2.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
5.4.2.2 Analytic calculation of polynomial coefficients for the FDQSS method114
5.4.2.3 Conditions for the Time Instants of Polynomial Segments . . . . . 116
5.4.2.4 DEVS-based Algorithm to Obtain d(t + τ (t)) = y(t) . . . . . . . . 116
5.4.3 Experiments with the Buffer-Server System . . . . . . . . . . . . . . . . . . 118
5.4.4 Numerical Simulation of Sharp Discontinuities: The QSS Bounded Integrator 122
5.5 A Modular Approach for Fluid-Flow Network Modeling . . . . . . . . . . . . . . . . 124
5.5.1 Introducing Fluid Entities for Simplified Fluid Modeling . . . . . . . . . . . 125
5.5.2 Basic Low-Level Components encapsulating ODEs . . . . . . . . . . . . . . . 127
5.5.2.1 Fluid Data Network Sources: TCP Window and Unresponsive
Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
5.5.2.2 Buffering Mechanisms: Tail-Drop and RED Queues . . . . . . . . . 129
5.5.2.3 Multiplexing Fluid Entities: Fluid-Flow routing . . . . . . . . . . . 133
5.5.3 Modular Construction of Fluid-Flow Topologies . . . . . . . . . . . . . . . . 133
5.5.3.1 High-Level Fluid-Flow Models . . . . . . . . . . . . . . . . . . . . . 134
5.5.4 Modeling of Fluid-Flow Topologies . . . . . . . . . . . . . . . . . . . . . . . 135
5.5.5 Experiments with Fluid-Flow Models . . . . . . . . . . . . . . . . . . . . . . 137
5.5.5.1 Experiment 1: Single TCP sessions . . . . . . . . . . . . . . . . . . 137
5.5.5.2 Experiment 2: Multiple TCP Sessions and Interconnected Queues . 138
5.5.5.3 Experiment 3: Performance Scalability Analysis . . . . . . . . . . . 140
5.6 Hybrid Network Simulation: Integrating Fluid-Flow and Packet-Level models . . . . 140
5.6.1 Data Structures for Hybrid Models . . . . . . . . . . . . . . . . . . . . . . . 142
5.6.2 Turning Discrete Packets Into Continuous Signals: The Hybrid Link . . . . . 142
5.6.3 Hybrid Queue: Turning Continuous Buffer-Server Metrics into Discrete Events143
5.6.4 Hybrid Topologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
5.6.5 Controlling Performance Through Smoothing Continuous Signals . . . . . . 146
5.6.6 Experiments with the Hybrid Buffer-Server system . . . . . . . . . . . . . . 146
Contents 9

5.6.6.1 Experiment 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147


5.6.6.2 Experiment 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
5.6.6.3 Experiment 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
5.6.6.4 Experiment 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
5.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154

Simulación por Aproximaciones Fluidas (resumen en castellano) 156

6 Conclusions and Future Work 157


6.1 Open problems and future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158

Bibliography 160

7 Appendix 172
7.1 Proof of Theorem 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
7.2 New Enhancements for the PowerDEVS Simulation toolkit . . . . . . . . . . . . . . 172
7.2.1 Py2PDEVS: a Python ↔ PowerDEVS interface . . . . . . . . . . . . . . . . 172
7.2.2 Configuration of Simulation Parameters . . . . . . . . . . . . . . . . . . . . . 173
7.2.2.1 Stochastic Distribution parameters . . . . . . . . . . . . . . . . . . 175
7.2.3 Storage of Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . 176
7.2.4 Documentation and Docker Development Image . . . . . . . . . . . . . . . . 177
7.3 New PowerDEVS Library of Packet-Level Network Models . . . . . . . . . . . . . . 178
7.3.1 RoutingTable (packet-level, fluid-flow and hybrid) . . . . . . . . . . . . . . . 178
7.3.2 FlowGenerator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
7.4 New Library of Fluid-Flow Network Models . . . . . . . . . . . . . . . . . . . . . . 180
7.4.1 QSS Bounded Integrator Coupled Model . . . . . . . . . . . . . . . . . . . . 181
7.4.2 Reservoir Coupled Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
7.4.3 Buffer-Server system Coupled Model . . . . . . . . . . . . . . . . . . . . . . 182
7.5 New Library of Hybrid Network Models . . . . . . . . . . . . . . . . . . . . . . . . . 183
7.5.1 Packet2HybridFlow Atomic Model . . . . . . . . . . . . . . . . . . . . . . . 184
7.5.2 HybridMerge Atomic Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
7.5.3 HybridDemux Atomic Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
7.5.4 Hybrid RED Port Coupled Model . . . . . . . . . . . . . . . . . . . . . . . . 185
7.5.5 Hybrid Router Coupled Model . . . . . . . . . . . . . . . . . . . . . . . . . . 186
List of Figures

2.1 Example of a generic discrete-events trajectory . . . . . . . . . . . . . . . . . . . . . 30


2.2 Basic DEVS atomic models (left) and coupled models (right) . . . . . . . . . . . . . 30
2.3 Example events/states trajectories for a DEVS atomic model. . . . . . . . . . . . . 31
2.4 Hierarchical simulation of DEVS models and DEVS abstract simulator . . . . . . . 33
2.5 Quantized state function with hysteresis . . . . . . . . . . . . . . . . . . . . . . . . 34
2.6 Example of trajectories in a QSS Plot . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.7 Trajectories for different QSS orders of accuracy . . . . . . . . . . . . . . . . . . . . 36
2.8 Block diagram representation of QSS the state system Equation (2.10) . . . . . . . 37
2.9 PowerDEVS GUI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

3.1 The LHC accelerator facilities. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41


3.2 The LHC upgrade schedule and associated luminosity . . . . . . . . . . . . . . . . . 42
3.3 ATLAS particle detector at CERN . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.4 ATLAS wide-area network usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.5 The ATLAS TDAQ system in Run 2. . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.6 Methodology diagram based on the DEVS formal framework . . . . . . . . . . . . . 46

4.1 State machine of TCP transmission phases . . . . . . . . . . . . . . . . . . . . . . . 53


4.2 Evolution of TCP Reno Congestion Window. . . . . . . . . . . . . . . . . . . . . . 54
4.3 RED drop probability function as defined by Equation 4.1. . . . . . . . . . . . . . 55
4.4 TDAQ data flow applications in Run2. . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.5 FELIX system components (as of 2015) . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.6 PowerDEVS and OMNET++ topologies used to cross-compare TCP behaviour. . . 63
4.7 Comparison of TCP congestion window in OMNET++ and PowerDEVS . . . . . . 65
4.8 NetworkPacket and IProtocol class diagram . . . . . . . . . . . . . . . . . . . . . . 66
4.9 PowerDEVS Packet-Level Model Libraries . . . . . . . . . . . . . . . . . . . . . . . 67
4.10 Simple packet-level topology using PowerDEVS network library . . . . . . . . . . . 69
4.11 Semi-automated topology modeling workflow with TopoGen for network simulation. 70
4.12 Topology of the FELIX system. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.13 Simulated mean packet latency seen by the Traffic Monitoring servers . . . . . . . . 72
4.14 Topology and applications in the HLT TDAQ farm for Run2. . . . . . . . . . . . . 73
4.15 Sequence diagram of TDAQ applications involved in Event filtering . . . . . . . . . 74
4.16 DEVS TDAQ simulation model implemented in PowerDEVS . . . . . . . . . . . . . 75
4.17 Filtering latency versus initial DCM credits (real vs simulated results) . . . . . . . . 78
4.18 Average Event latency sweeping the HLTSV assignment rate . . . . . . . . . . . . . 80

10
List of Figures 11

4.19 Heatmap of the simulated HLT farm load for different HLTSV assignment policies . 81
4.20 Load-Balancing view of the HTL system. . . . . . . . . . . . . . . . . . . . . . . . . 82
4.21 Load balancing and Dispatcher models in PowerDEVS . . . . . . . . . . . . . . . . 84
4.22 Comparison of IBL scheduling with different number of servers . . . . . . . . . . . . 88
4.23 Comparison of different strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
4.24 Performance of different strategies in the critical regime . . . . . . . . . . . . . . . . 90
4.25 Sensitivity of load balancing strategies to service time distribution . . . . . . . . . 91
4.26 Comparison of assignment policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

5.1 The Past, Present, and Future of Ethernet . . . . . . . . . . . . . . . . . . . . . . . 97


5.2 Flowchart of Fluid Model Solver for the MGT model . . . . . . . . . . . . . . . . . 100
5.3 Bouncing ball simulation result using Runge-Kutta . . . . . . . . . . . . . . . . . . 103
5.4 Event skipping in discrete–time algorithms . . . . . . . . . . . . . . . . . . . . . . . 105
5.5 Approximation errors when mixing fluids and packets . . . . . . . . . . . . . . . . . 105
5.6 A ball bouncing down stairs using QSS2 . . . . . . . . . . . . . . . . . . . . . . . . 106
5.7 PowerDEVS implementation to obtain the dynamic delay . . . . . . . . . . . . . . . 112
5.8 New FDQSS block diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
5.9 FDQSS algorithm based on polynomial segments . . . . . . . . . . . . . . . . . . . 114
5.10 Comparison of QSS Delay Blocks available in the PowerDEVS . . . . . . . . . . . . 118
5.11 Buffer-server system experiment setups (fluid and packet-level) . . . . . . . . . . . . 118
5.12 Comparison of the fluid buffer-server system and packet-level queue . . . . . . . . . 120
5.13 Comparison of FDQSS, dynamic calculation of τ and discrete packet simulation
using a buffer-server system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
5.14 QBI block available in the hybrid library implemented in PowerDEVS . . . . . . . 123
5.15 Example comparing the available QSS integrators . . . . . . . . . . . . . . . . . . . 124
5.16 Class Hierarchy for Fluid-Flow Entities . . . . . . . . . . . . . . . . . . . . . . . . . 127
5.17 Models enabling Fluid Entities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
5.18 Fluid-Flow Data Source blocks in the Low-Level Library . . . . . . . . . . . . . . . 129
5.19 RED Queue DEVS coupled model composition . . . . . . . . . . . . . . . . . . . . 130
5.20 Fluid-Flow Queue blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
5.21 Block-oriented representation of equations implemented in PowerDEVS . . . . . . . 132
5.22 Fluid-Flow demultiplexer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
5.23 High-Level Fluid-Flow Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
5.24 Simple Fluid-Flow topology using PowerDEVS network library . . . . . . . . . . . . 136
5.25 Packet-Level topology to be compared with the Fluid-Flow topology . . . . . . . . . 137
5.26 Topologies for experiment 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
5.27 Comparison of packet-level and fluid simulations for Experiment 1 . . . . . . . . . . 138
5.28 Packet-level vs fluid-flow with 2 RED and 120 TCP sessions . . . . . . . . . . . . . 139
5.29 Packet-level and fluid-flow execution times for Experiment 3 . . . . . . . . . . . . . 140
5.30 HybridFlow class within the complete network data structure hierarchy . . . . . . . 142
5.31 Hybrid link receiving discrete packets and generates fluid-flow continuous signals. . 143
5.32 Hybrid queue DEVS coupled model implemented in PowerDEVS . . . . . . . . . . 144
5.33 Hybrid topology: fluid-flow and a packet-level hosts sharing an hybrid queue . . . . 145
5.34 Comparison of hybrid and packet-level queues fed by the same packet-level flow. . 147
5.35 Hybrid simulation with fluid-flow and packet-level flows sharing the same bottleneck 148
12 List of Figures

5.36 Hybrid simulation results for the foreground/background traffic experiment . . . . . 150
5.37 Hybrid simulation metrics with 20 packet-level and 20 fluid-flow TCP sessions . . . 152
5.38 Hybrid simulation results increasing the percentage of packet-level traffic . . . . . . 153

7.1 Py2PDEVS architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173


7.2 Example Py2PDEVS code to represent a network topology . . . . . . . . . . . . . 174
7.3 PowerDEVS Library of Packet-Level Network Models . . . . . . . . . . . . . . . . . 178
7.4 PowerDEVS Library of Fluid-Flow Network Models . . . . . . . . . . . . . . . . . 181
7.5 PowerDEVS Library of Hybrid Network Models . . . . . . . . . . . . . . . . . . . . 184
List of Tables

3.2 TDAQ requirements obtained during system analysis meetings . . . . . . . . . . . . 45

4.2 Overview of some network simulator comparisons (extended from [110]) . . . . . . . 57


4.3 FELIX traffic types and requirements. . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.4 PowerDEVS and OMNET++ configuration parameters to compare TCP behaviour 64
4.5 Comparison TCP metrics in OMNET++ and PowerDEVS . . . . . . . . . . . . . . 64

5.2 Comparison of the Dynamic and Forward delay approached for FDEs. . . . . . . . 119
5.4 Block interface for the TCP host . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
5.6 Block interface for the RED and tail-drop queue models. . . . . . . . . . . . . . . . 131
5.8 Comparison of fluid-flow and hybrid models . . . . . . . . . . . . . . . . . . . . . . 155

7.2 RoutingTable atomic model (packet-level, fluid-flow and hybrid) . . . . . . . . . . . 179


7.4 FlowGenerator atomic model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
7.5 QSS Bounded Integrator (BQI) coupled model . . . . . . . . . . . . . . . . . . . . . 181
7.6 Reservoir coupled model for 2 flows . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
7.7 Buffer-server system coupled model . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
7.9 Packet2HybridFlow atomic model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
7.11 HybridMerge atomic model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
7.13 HybridDemux atomic model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
7.14 Hybrid RED Port coupled model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
7.15 Hybrid Router coupled model with 3 ingress and 2 egress ports . . . . . . . . . . . . 186

13
Chapter 1

Introduction

It’s hard to make predictions, especially about the future

Niels Bohr

Simulation modeling is a cornerstone discipline for the study of dynamics in complex networked
systems. Large packet-switching computer networks are a paradigmatic case, where performance
evaluation is a cross-cutting activity in all stages of the conception of future technologies.
The central motivating case study in this Thesis is the Trigger and Data AcQuisition (TDAQ)
network in the ATLAS experiment at CERN [1]. TDAQ transports data generated by the ATLAS
detector and filters throughput from roughly 160Gbps down to approximately 1.6Gbps in quasi-
real time. A combination of complexities in terms of network size, throughput and underlying
algorithms supporting its data flow makes TDAQ a challenging setting to application developers
and network designers.
Simulation models can provide useful tools for the understanding of communication patterns,
the fine-tuning of applications, protocols, and hardware, and also for the design and capacity
planning of network upgrades.
Another sound approach to performance evaluation is analytical modeling, where mathematical
formulations provide means and tools to reason symbolically about systems.
Meanwhile, measurement techniques play also a key role both for inspiring new models based
on quantifiable evidence, or to validate models when their solutions - either analytic or numeric -
become available.
While all models are abstractions of a given system under study, the choice of the right modeling
strategy depends heavily on the types of questions to be answered.
In this setting, there is widespread consensus that simulation models can offer greater degrees
of detail and flexibility than analytical models, at the expense of increasing costs in simulation
time and computing power as the complexity of the system grows [2], [3].

But does it need to be always the case?

Underlying in the above assessment is the assumption of simulation pertaining to the world of
detailed fine-grained discrete-event simulation (DES), and coarse-grained analytical models pro-
viding results for steady-state network conditions.

14
15

Yet, there is also a special case of analytical models called fluid approximations [4]. They
work on the assumption that valid scaling limits exist for some underlying stochastic processes,
i.e. letting the time and/or space dimensions tend to infinity. The latter gives rise to continuous
dynamic equations that not only can capture the steady state of systems (in case it exists) but
also their transient dynamics.
Fluid approximations are of particular importance in the realm of simulation techniques as
the resulting equations seldom offer closed-form analytical expressions, thus calling for numerical
solutions pertaining to the domain of Ordinary Differential Equations (ODEs). They are almost
inevitably solved by discrete-time techniques, found in the vast body of knowledge for continuous
system simulation [5].

In our view, this represents a turning point in the evolution of network evaluation, where the
simulation community has largely specialized either on discrete event simulation or on continuous
systems techniques and tools.
Our motivating case study in the TDAQ network is not an exception. In the TDAQ network,
a highly distributed system is composed of multiple applications that coordinate to process large
amounts of data in real-time. Communication and data flow patterns challenge standard network
protocols and hardware to achieve maximum performance. In this sense, fine-grained packet-level
models are needed to understand and fine-tune detailed network behavior.
On the other hand, the TDAQ network presents a farm with over 2000 servers interconnected
with Ethernet links of 1 and 10Gbps. Simulation execution times impose a practical limit on the
kind of questions that can be answered by packet-level models and for the level of details which
can be simulated within reasonable time frames.
Future upgrades are scheduled for 2021 and 2027 which will considerably increase the network
size and throughput, worsening the gap between packet-level performance capabilities and real-
world requirements [6]. In this sense, fluid-flow models can provide a practical solution to model
such high-throughput large-scale networks.
Unfortunately, in fluid-flow simulation, only averaged metrics can be obtained making it im-
possible to analyze detailed per packet events which are sometimes crucial to understand relevant
dynamics of network protocols and applications.

A fundamental question arises: can we aim at having the best of both worlds?

The so-called hybrid simulation modeling treats the system as an interaction between discrete
and fluid parts. This idea first appeared in the mid 90s mainly driven by the concept of “cells”
in ATM networks, where a cell (a data packet of small and fixed-length) can be metaphorically
compared to an atom in a continuous fluid.
Although the idea is appealing and several studies tried to develop it, the approach did not
scale up. Currently the bibliography on hybrid network simulation is way too scarce as compared
to the tremendous growth in network modeling and simulation literature, let alone data network
evaluation at large.
We claim this is a consequence of a lack of theoretical and practical tools to provide a sound
framework for hybrid systems simulation. At the heart of the problem is the issue of trying to
force one paradigm fit into another.
Even when there exist solutions to deal with discrete events in the context of continuous
16 Chapter 1. Introduction

simulation methods, often discontinuities are treated as exceptions and involve costly iterative
algorithms. Yet, in network simulation, a discrete event is a first-class citizen, on equal footing
with synchronous time steps required to solve ODEs. Dealing with approximations of continu-
ous dynamics while treating exactly very frequent asynchronous events brings about several time
management problems. The approaches found in the literature are either smart ad-hoc algorithms
covering certain cases or forceful adaptations of particular discrete event network simulation pack-
ages.

In this Thesis we develop new theory and simulation techniques to unify the simulation of
fluid-flow and packet-level representations of data networks under a common formal framework.
A modeler should be able to adopt the same tool and formalism for dealing with diverse represen-
tations of the system under study, at very different levels of abstraction.
We rely on DEVS [7], a formalism that can cope simultaneously with the exact representation
of discrete systems and the approximation of continuous systems with any desired level of accuracy.
The DEVS abstract simulator acts as a universal scheduler that can accommodate naturally the
simulation of Quantized State Systems (QSS) [8]. The QSS methods are a family of numerical
integration algorithms based on the quantization of state variables rather than the slicing of time
into discrete steps, an approach that makes QSS belong to a discrete event class of solution for
continuous systems simulation [5].
We consider that one of the main reasons why hybrid approaches did not scale up is due to
an undesirable intertwinedness of modeling concerns with simulation techniques [9], [10]. In this
regard, DEVS imposes a strict separation between models and simulators. Thus, the contributions
made at the simulation technique level are independent on any particular model, and therefore
much more generalizable than ad-hoc solutions (e.g. tool-specific, language-specific, etc.) The
DEVS framework is not linked to any particular software tool nor programming language. There-
fore, a DEVS-based solution to a given problem offers a longer survival horizon than tool-based
solutions.
Yet, in this Thesis we also advocate strongly for practical and flexible tools usable in real-world
projects.
The TDAQ network provides an ideal real-world case study that motivates the development,
test and validation simulation models. In this context, a DEVS-based development methodology is
tailored to frame different modeling and simulation activities. The models developed throughout
this work have served to support decisions made for design, capacity planning and fine-tuning
projects in TDAQ.
Finally, although the focus is made on networked computer systems, several modeling and
simulation techniques contributed by this work can be naturally extended to a broader class of
complex networked systems, e.g. those found in physics, natural and social sciences.

1.1 Main Original Contributions


The main contribution of this Thesis is to prove the theoretical and practical feasibility of unifying
packet-level, fluid-flow and hybrid simulations under a common formal framework.
New simulation techniques, and their associated theoretical support are developed allowing
users to seamlessly specify network topologies at different abstraction levels, in a modular and
1.2. Supporting Publications 17

hierarchical way. Packet-level models are described in Chapter 4, and fluid-flow and hybrid models
are presented in Chapter 5.
We developed a new library of reusable, general purpose packet-level models based on the
DEVS formal framework. We proved that these models are effective to study large complex real
networks, such as the TDAQ network at the ATLAS experiment (Section 4.4).
Using this library we then studied load balancing dynamics. Through systematic simulations
we found new evidence that there is a class of efficient decentralized load balancing policies that
share a common critical regime and, moreover, can be interpreted as a generalization of well-known
regimes for which the blocking probability has a closed-form analytic expression. This regime is
the desired point of operation as it takes the best advantage of the available resources. Our results
indicate that it is possible to design decentralized load balancing policies that could reach the
same performance (in the fluid limit) as centralized policies, and using previously existing closed-
form equations. Guided by these findings, in Section 4.5, we proposed enhancements in the load
balancing algorithms of TDAQ. The changes were first verified in our TDAQ simulation model,
and then experiments in a real network validated the predictions.
We also present a novel library for hybrid simulation, described in Section 5.6, where fluid-
flow and packet-level models interact without resorting to ad-hoc time synchronizations or traffic
smoothing techniques. Experiments show that simulation times can be reduced by orders of
magnitude while the packet-level portions preserve very satisfactory levels of accuracy.
Additionally, a new modeling capability is presented to compose fluid-flow networks modularly,
allowing users to build fluid topologies resembling classic computer network diagrams. Moreover,
this is achieved without having a user to deal explicitly with the underlying system of ordinary
differential equations (which gets composed -and later solved- automatically and transparently).
This is discussed in Section 5.5.
From the theoretical point of view, a new Forward Delay QSS (FDQSS) numerical method is
presented. FDQSS supports the numerical approximation of Retarded Functional Delay Equations
(RFDE) with Implicit Delays, in the context of QSS integration. We prove that under reason-
able assumptions FDQSS provides guarantees of numerical asymptotic convergence at any desired
accuracy for RFDE type of systems. This is a general result, applicable to any kind of system
represented by RFDEs, which are of crucial importance to deal with ubiquitous delayed dynamics
in data networks (where delays depend on complex dynamics, including the delays themselves, in
a recursive fashion).

1.2 Supporting Publications


The results presented in this Thesis were partially published in the following peer-reviewed publi-
cations:

• In [11] we define an iterative DEVS-oriented model development methodology, tailored for


the modeling and simulation project of the TDAQ network in the ATLAS experiment at
CERN. This is presented in Chapter 3. The publication partially describes results presented
in Chapter 4 obtained with the new packet-level model libraries.

• In [12] we applied a simulation-based strategy to characterize systematically and extensively


several classes of balancing policies under various statistical assumptions. This is presented
18 Chapter 1. Introduction

here in Chapter 4 in the context of the TDAQ network.

• In [13], we present our modeling approach to unify the experience of designing network
simulation models both with fluid-level and packet-level techniques under a single modular
and hierarchical formal framework. This is discussed in Chapter 4 along with the additional
development of hybrid models.

• The publications [14] and [15] describe MASADA and TopoGen, respectively. These are
tools to systematize and automate modeling, simulation and analysis tasks in general, and
were applied to assist practical simulation-based studies for the TDAQ system. A scenario
that uses TopoGen is included in Chapter 4.

A synthesis of the above listed results was presented in the form of an extended abstract at
the 51st. Winter Simulation Conference (Gothemburg, Sweden, 2018) for the Ph.D. Colloquium
(Awarded with the Best Ph.D. Student Paper Award by ACM-SIGSIM) [16].
Also, in [17] a generic tool named Py2PDEVS was developed to compose large topologies with
the Python language, to interconnect PowerDEVS blocks programmatically as an alternative to
the drag and drop-oriented PowerDEVS GUI.

1.3 Overall Related Work and Relevance of Contributions


Previous work relevant for this Thesis can be organized according to well-known simulation tech-
niques: 1) packet-level simulation tools, 2) fluid-flow network models, and 3) hybrid network
simulation.

Packet-level network simulation. The amount of literature and network simulation tools is
vast, both in the academic and commercial environments [3]. Some of the most common network
simulators mentioned in the literature are: ns-2 [18], ns-3 [19], OMNET++ [20], OPNET [21], JiST
[22], just to name a few. Each tool has its advantages and disadvantages, while available network
simulators provide a great variety of features, off-the-shelf network protocols, traffic/topology
generation, parallelization algorithms, graphical interfaces, etc. These tools have been widely used
both for research and commercial projects where they proved very useful for network performance
evaluation [23].
In this sense, our approach to packet-level simulation follows a similar strategy as most of the
main simulation tools already available: use discrete-event simulation to provide detailed models
for as many network components as possible. A clear distinction of our approach is that we rely on
a formal discrete-event specification and methodology DEVS. One of the many possibilities which
are opened is the representation of other modeling structures within the same formal specification,
in particular continuous dynamic systems among others. The tools developed in this Thesis are
not intended to compete with well-established network simulation tools in terms of the number of
features. The goal here is to demonstrate that a formal discrete-event modeling and simulation is
able to represent big complex real networks. This is demonstrated in Chapter 4 in the context of
the DAQ network in the ATLAS experiment were we show simulation is able to reproduce network
and application metrics as well as predict the impact of proposed changes.
1.4. Thesis Organization 19

Fluid-flow network simulation. The literature is not so abundant and is found mainly in
academic papers or specialized textbooks [4]. There is salient work in the development of analytical
models which demonstrated empirically to accurately approximate real-network dynamics while
obtaining considerable reductions in simulation execution time. There exist fluid-flow models for
several network protocols (especially different TCP versions) and varied network conditions.
For the models in this Thesis we chose to base equations on one of the most referenced fluid-flow
models in the literature, namely [24]. We do not focus on the mathematical model or techniques to
infer equations, but rather on the numerical simulation method to approximate the equations and
on the network modeling experience. In this sense, most of the fluid-flow literature approximate
equations with classical numerical methods which result in discrete-time simulations. Instead,
we rely on the QSS numerical approximation methods which can be naturally represented as as
discrete-events DEVS models. This allows the simulation of fluid-flow models within the same for-
malism and tools as used for packet-level simulations [5]. Chapter 5 describes our proposed models
and shows simulation execution time and accuracy is comparable to results found in the litera-
ture. On the other hand, in the fluid-flow literature there is a strong coupling of the mathematical
model, the numerical approximation methods, and the network modeling experience. To repre-
sent a communication network, the modeler needs to have strong knowledge in these three areas.
On the contrary, as shown in Section 5.5, the proposed approach decouples these three domains
of knowledge providing the network modeler with the same visual interface (or generation tools)
as for packet-level models enabling the specification of network topologies without knowledge on
ODEs or numerical methods. Network models encapsulate equations which are specified in block
diagrams, so researchers can develop new ODE models independent of the numerical methods use
to simulate it.

Hybrid network simulation. Few authors proposed the integration of fluid-flow models into
well-know network simulators [9], [10], [25], [26]. These models report to maintain the performance
advantages of fluid-flow models while keeping accurate packet-level flows. In all cases, on top of
the packet-level and fluid-flow challenges there exists the need to synchronize time management
systems of discrete-events and discrete-time simulators. A key difference of our hybrid approach,
described in Section 5.6, is that no such synchronization is necessary. We profit from both packet-
level and fluid-flow simulations being represented under a discrete-event formalism to build hybrid
simulations that do not require time synchronizations and interact smoothly.

1.4 Thesis Organization


The Thesis is organized in six Chapters as follows.
This first Chapter introduced the motivations and scope of the Thesis together with a short
summary of the main contributions and supporting peer-reviewed publications.
Chapter 2 presents an overview of the main topics and theoretical background on which the
Thesis is grounded. First, it introduces basic concepts of network performance evaluation and
describes the packet-level and fluid-flow approaches simulation which are the main topics of the
Thesis. Then, an overview of hybrid dynamic systems and classical numerical integration is pre-
sented. The DEVS formalism and QSS integration methods are presented showing the backgrounds
that enable the simulation of discrete-events and approximation of continuous systems within a
20 Chapter 1. Introduction

unified framework. Finally, a short review of the PowerDEVS simulator is presented as it is the
practical tool chosen in this Thesis to implement the new methods and techniques.
Chapter 3 describes the methodological modeling and simulation aspects followed throughout
the development of the Thesis. An overview of the motivating case study of the DAQ network
in the ATLAS experiment at CERN is presented. A DEVS-based methodology tailored for the
modeling and simulation of the DAQ network is described together with other aspects that provided
a methodological framework for the rest of the developments.
In Chapter 4, the packet-level model developed under the DEVS formalism is described. Fol-
lowing the bottom-up approach, first the simulation model of the network layer is presented. The
TCP protocol model is described and compared against the OMNET++ implementation. Then,
models supporting the construction of network topologies are described and a case study of a
medium sized topology is shown. Later, models for the applications for the DAQ network are
presented and two scenarios are used to compare real world metrics with simulation results. Fi-
nally, load balancing strategies are studied through packet-level simulation and compared against
analytic results from queuing theory. Findings from this study are applied back into the simulation
model of the DAQ network and then real world metrics confirm the performance improvements
predicted by the simulation.
In Chapter 5, the fluid-flow and hybrid models developed also under the DEVS formalism are
described. The buffer-server system, which will be the central component of fluid-flow and hybrid
models, is first presented. The set of equations that describe the buffer-server system are charac-
terized as Functional Delay Equations (FDE) and the challenges for its numerical approximation
are described. For solving the delay nature of the equations, a new QSS method is developed to
support the numerical approximation of Retarded FDEs with implicit delays. A new QSS bounded
integrator is described to enforce maximum and minimum integration boundaries by appropriately
handling sharp discontinuities present in continuous network variables. Later, the Chapter de-
scribes the models that encapsulate ODEs describing network dynamics. A modular approach is
presented in which fluid-flow topologies are constructed without requiring knowledge of underlying
equations. These models are compared against packet-level equivalent models where it is shown
that the fluid model performance is independent of link speed and can approximate packet-level
metrics with acceptable accuracy. Finally, a novel hybrid approach is presented to integrate the
packet-level and fluid-flow models taking profit from the common DEVS representation and QSS
properties. Experiments show the hybrid model does not require clock synchronization or smooth-
ing techniques, maintains the performance advantages of the fluid-model, and provides detailed
packet-level tracks.
Chapter 6 wraps up the Thesis with overall conclusions and discusses possibilities for future
research.
Resumen: Introducción

El modelado y simulación (M/S) es una disciplina clave para el estudio de dinámicas en sistemas
complejos de redes conmutadas. El caso de estudio central en esta Tesis es la red de Filtrado y
Aquisición de Datos (TDAQ) en el experimento ATLAS en CERN [1], que reduce aproximadamente
160 Gbps a aproximadamente 1.6 Gbps en tiempo real. Los modelos de simulación son una
herramientas útil para el estudio de protocolos, el diseño y la planificación de futuras redes [2].
Existen dos enfoques para la simulación de redes. Por un lado, la simulación paquete a paquete
brinda resultados de grano fino cercanos a las métricas reales [3]. Este enfoque utiliza un paradigma
a eventos discretos. Por otro lado, la simulación por aproximaciones fluidas [4] utiliza ecuaciones
diferenciales ordinarias (ODE) que capturan el comportamiento promedio del sistema. Existe una
vasta literatura [5] para la resolución de ODEs, donde los métodos clásicos utilizan un paradigma
a tiempo discreto.
La comunidad de simulación se ha especializado ya sea en simulación de eventos discretos ó
en técnicas y herramientas para sistemas continuos. Para el MS de la red TDAQ, son necesarios
modelos detallados a nivel de paquetes para estudiar el comportamiento de la red, pero los tiempos
de simulación imponen un límite práctico que se acrecentará en el futuro [6]. Por otro lado,
los modelos de aproximaciones fluidas proporcionan una solución escalable, pero únicamente se
obtienen métricas promediadas que no son suficientes para comprender los detalles relevantes del
sistema.
Surge entonces una pregunta fundamental: ¿es posible obtener lo mejor de ambos mundos?
La simulación híbrida trata la interacción entre partes discretas y fluidas. Aunque la idea es
atractiva el enfoque no prosperó. Actualmente, la bibliografía sobre simulación de redes híbridas
es escasa y los enfoques existentes son algoritmos ad-hoc inteligentes que cubren ciertos casos o
son adaptaciones de herramientas particulares.
Afirmamos que esto es una consecuencia de la falta de herramientas teóricas y prácticas para
proporcionar un marco sólido para la simulación de sistemas híbridos. Las herramientas paquete
a paquete y la resolución clásica de ODEs son intrínsecamente dispares. El corazón del problema
está en tratar de forzar que un paradigma que encaje en el otro.
Planteamos entonces la siguiente hipótesis de investigación: "Modelos de tipo paquete a paquete,
fluido e híbrido pueden representarse bajo un único formalismo de modelado y simulación, apor-
tando ventajas de expresividad para modelado y de desempeño para simulación". Desarrollamos
a lo largo de esta Tesis nuevas bases teóricas y prácticas para unificar los tres enfoques con el
objetivo central de proveer modelos de red formales que sean genéricos, reutilizables y modulares.
Parte de los aportes de esta tesis fueron publicados en una revista indexada y 4 publicaciones
con referato en conferencias internacionales. La Tesis esta organizada en 6 capítulos y un anexo.

21
Chapter 2

Background

E is considered to be a continuously divisible quantity, this distribution is


possible in infinitely many ways. We consider, however—this is the most
essential point of the whole calculation—E to be composed of a well-defined
number of equal parts and use thereto the constant of nature h

Planck’s constant h = 6.55x10−27 erg sec, as introduced in 1900

2.1 Network Simulation Approaches: Packet-Level and Fluid-


Flow
Performance evaluation of data networks has been a key component to successful engineering
projects ranging from data-center design to Internet protocols studies. The number of network
users, services, and applications grows exponentially. Improvements in network technologies en-
gaged many researchers, developers, and designers in optimizing networked computer system’s
performance. Network designs and proposals must be proven effective and validated before they
are commissioned and deployed.

2.1.1 Network Performance Evaluation Techniques


In general, the literature distinguishes between three approaches to study the performance of data
networks: mathematical (or analytical) modeling, simulation modeling, and testing. Each tech-
nique presents with pros and cons, and there exists a vast literature for each of them, with several
books discussing when and how an analyst should choose one or the other [2], [27]. Techniques
can also complement each other for cross-validation purposes. Methodologically, it is advised to
work with at least with two techniques.
Mathematical modeling employ equations to describe and predict the behaviour of a sys-
tem, often making several simplifying assumptions that allow for tractable formulae. Commonly
used analytical models are those of queues and there exits multiple models to represent network
traffic [28], [29]. On the one hand, mathematical models are inexpensive and fast to experiment
with for different parameters making them suitable for large-scale scenarios. On the other hand,

22
2.1. Network Simulation Approaches: Packet-Level and Fluid-Flow 23

they can provide limited insights and coarse-grained results, and also require strong mathemat-
ical backgrounds. In many cases their solutions can not be obtained analytically and numerical
methods are then used to simulate the equations [4].
Testing real network hardware, applications and protocols, on the contrary, provide accurate
and (hopefully) credible measurements. This requires an implementation of the system to be
studied, which is often too expensive or requires too much effort to setup, or it is actually impossible
to access in a timely manner. Direct experimentation also provides less flexibility on experimental
conditions and it is often difficult to record very detailed metrics and understand corner case
behaviours. Real implementation approaches generally suffer from scalability limitations. [2].
Simulation modeling techniques rely on executable abstractions of the real system which
can later be run in a computer. Compared to analytical models, they require more effort to code
the simulation and usually require more computing resources to produce results making them less
appealing for high-speed large scale scenarios. On the other hand, simulation models require less
assumptions than analytical models, and yield more accurate and credible results. Compared to
testing real systems, models can never take all aspects into consideration and simulations results
are then usually less credible (although cases exist where bad testing practices can be detected
by having simulations to contrast results against). Simulation models provide more flexibility to
change experimental conditions, are usually (although not always) less costly, and can be used to
evaluate a technology which does not exist yet [3].
Lastly, Emulation employs an experimental setup containing both real world and simulated
components [30] resulting in a technique with the advantages and disadvantages it constituents.
On the one hand, it requires both the development of a simulation model and the availability of
real hardware with the associated costs and efforts. On the other hand, it is less costly and more
flexible than a fully realistic setup, and results are generally more credible than fully simulated
results. Also, the interaction between simulated components and real ones is a challenge in itself,
often restricting the simulation execution to be in real-time only.

We shall focus on simulation modeling techniques, which provide reliable and flexible ways to
develop and test ideas on large-scale network scenarios.

2.1.2 Scaling Limits: From Stochastic Discrete systems to Deterministic


Fluid Models
Throughout our study we look at data networks as intrinsically discrete systems driven by stochas-
tic dynamics.

In this Thesis we assume that all conditions are given to obtain a fluid approximate
model as a correct scaling limit for an underlying stochastic discrete network.
This is, we are not interested in deriving new fluid models, but rather in their simu-
lation aspects as combined with discrete models, and more specifically, in the efficient
simulation of their transient behavior.

Yet, some comments are in order to provide a broader view for the sake of completeness.
24 Chapter 2. Background

2.1.2.1 Zooming Out to Reduce Complexity


When “zooming out” a data network analysis from component level (e.g. queues) to system-level
(e.g. the Internet) many new, hard to model sources of randomness kick in.
Perhaps the most comprehensive single resource for a thorough network-oriented mathematical
treatment of limit scales for congestion control is that of [31]. There, stochastic models are de-
veloped for a single link used by congestion-controlled sources, showing that when the number of
users in the network is large a deterministic equation can be obtained to characterize the system
behavior, see also [32]. This idea is analogous to the well-known law of large numbers for random
variables [33]. Also, stochastic models are developed to describe the variability around the deter-
ministic limit (analogous to the central limit theorem for random variables [33]). Models are then
obtained using the interaction between congestion control at the sources and congestion indication
mechanisms at the router.
In fact, a myriad of systems can be effectively described by stochastic models, for instance
biological systems [34], epidemic spreading [35], queuing networks [36]. The common underlying
feature is that systems are composed by a set of “agents” (objects, units, entities) interacting
together. Each individual agent is typically described in a simple way, such as finite state machines
with few states. An agent changes state spontaneously or by interacting with other agents in the
system. All transitions happen probabilistically and take a (possibly random) time to complete.
In data networks, agents are usually network nodes (e.g. a router) running some logic (e.g. a
protocol) that perform actions on other types of discrete agents (e.g. a data packet, or a stream
of packets of a same “class”). The workload of data imposed to the network is driven by stochastic
processes that define e.g. the arrival time, size and possibly also route or discard probability of
packets.

Deterministic fluid approximations can describe the large scale behavior of these stochas-
tic discrete systems using simpler Ordinary Differential Equations (ODE), i.e., a deter-
ministic, continuous state and continuous time type of model.

Such scaling limits (also called “mean-field limits” when agents are asymptotically interacting
through the averaged measure of the system, or “hydrodynamic limits” for more general cases in the
context of particle systems) are used in many fields from population biology, physical sciences, to
queuing networks. They are also known as “fluid limits” usually in the context of queuing models.
Several mathematical tools are required to extract a correct ODE fluid limit candidate out of the
underlying discrete system. Aspects about existence and uniqueness of the solutions, convergence
in probability, transient vs. asymptotic limits etc. must be considered carefully. They involve
deep aspects of probability theory, and their analysis can change drastically even with apparently
small variations in the underlying discrete system. For instance, the mathematical treatment of the
mean stationary occupancy of a buffer fed by a Poisson-driven arrival process changes considerably
if we study a finite or infinite buffer.

The basic idea of scaling limits [37] is that, when counting the number of discrete agents
that are in a given state, the fluctuations due to stochasticity become negligible as the
number of agents N grows.
For sufficiently large N the system becomes essentially deterministic.
2.1. Network Simulation Approaches: Packet-Level and Fluid-Flow 25

A series of results, e.g. [38], [39], have established that when the state space of each agent
is finite and the dynamics are sufficiently smooth, the system’s behaviour converges as N → ∞
to a limiting behaviour described by a system of ODE. The dimension of said system is equal to
the number of states of the individual agents, but independent of the population size N . This
dimension is typically small, and hence the numerical integration of these ODE can be extremely
fast (at least compared to a detailed simulation of the original discrete stochastic system).
√ These
results show that the intensity of the fluctuations around fluid limit goes to zero with 1/ N .

2.1.2.2 On Markov Processes


Perhaps the most prominent analytical tool to obtain scaling limits in stochastic networks is the
Markov model (A.A. Markov in 1907) (see [40]), in particular Continuous Time Markov Chains
(MTMC). Markov Chains are omnipresent in applied probability as they allow to obtain models
for stochastic processes with appropriate tradeoffs between complexity and computability. The
models should be simple enough to be attacked mathematically, but complex enough to make
us discover relevant behaviours. Scalings can drastically simplify the intrinsic information of the
process, but at the same time retain some of its crucial features.
Obtaining scaling limits for Markov Chains can be done in several ways (e.g. resorting to the
theory of processes known as martingales). One of the first authors to prove such results for the
convergence of Markov processes to deterministic ODE was Kurtz in 1970 [38]. Markov Chains
have been in use extensively in performance analysis since at least the fifties and are currently
omnipresent in performance and reliability analysis.
Following [40], a Markov Chain consists of a set of states and a set of labeled transitions between
the states. A state of the Markov chain can model various conditions of interest in the system
being studied. These could be the number of jobs of various types waiting to use each available
resource, the number of resources of each type that have failed, the number of concurrent tasks of
a given job being executed, and so on. After a sojourn in a state, the Markov Chain will make
a transition to another state. Such transitions are labeled with either probabilities of transition
(in case of discrete-time Markov chains) or rates of transition (in case of continuous-time Markov
chains).
Long run (or steady-state) dynamics of Markov Chains can be studied using a system of
linear equations with one equation for each state. Transient (or time-dependent) behavior of a
continuous-time Markov Chain gives rise to a system of first-order, linear, ODE. Solutions for
these equations result in state probabilities of the Markov Chain from which desired performance
measures can be easily obtained.
The number of states in a Markov Chain of a complex system can become very large, and,
hence, automated generation and efficient numerical solution methods for underlying equations
are desired. If the Markov Chain has a nice structure, it is often possible to avoid the generation
and solution of the underlying (large) state space. For a class of queuing networks, known as
Product-Form Queueing Networks (PFQN) it is possible to derive steady-state performance mea-
sures without resorting to the underlying state space.

To conclude, in this Thesis we will make extensive use of models obtained by means of scaling
techniques when we deal with fluid-flow (and later, hybrid) simulation. Also, fluid models serve as
a useful first approximation tool when we deal with the simulation of load balancing models.
26 Chapter 2. Background

2.1.3 Network Simulation Approaches


Mainly two very disparate approaches dominate the modeling and simulation (M&S) of data
networks: the packet-level and the fluid-flow approaches.
The packet-level approach yields fine-grained results comparable to real data networks, but its
complexity makes it unsuitable for the simulation of large high-speed networks. The fluid-flow
approach relies on ODE approximations for faster simulation, but captures only averaged network
behavior. Classically, each approach requires different knowledge and tools, making network ex-
perts adhere to either of them.

In the classical packet-level approach, network entities are modeled relying on different sched-
ulers to make the simulation advance in a discrete-event fashion. In general, they enforce a
packet-by-packet treatment where messages are sent from one model to another whenever a real
data packet would be sent through real network layers. In this sense, discrete-event simulation al-
gorithms fit very well, as they allow to schedule events (e.g. packet departures) at arbitrary points
in time. Just to name a few, ns-2 [18], ns-3 [19], OMNET++ [20], OPNET [21] use discrete-event
schedulers for their simulations. These tool have been widely used in academic studies and com-
mercial projects and provide a wide variety of networking features readily implemented [3]. Few
packet-level network simulators are supported by formal specifications that guarantee correctness.
Packet-level models implement applications, control logic, and protocols very closely to the
algorithms actually implemented in the real software/hardware products. Models provide fine-
grained results that can match closely real networks at the cost of increasing model complexity.
Unfortunately, it is well known that there exist issues in packet-level simulation performance scal-
ability when the complexity of the network grows (either due to topology complexity, throughput
intensity, or a combination of both).
Simulation time grows (at least) linearly with the number of nodes and link speed, making
them unsuitable for large high-speed networks. Such issues often impose a limitation in the qual-
ity and/or time-to-delivery of the answers that can be obtained via simulation. Moreover, network
technologies have witnessed an exponential growth in terms of the bandwidth [41] and topology
size (from massive clusters to the Internet itself). This situation increases the gap between the
performance capabilities of current network simulation techniques and real-world networks.

The fluid-flow approach proposes models of higher abstraction to describe network behavior
through a set of equations, trading speedups for coarser-grained accuracy. These models do not
see the individual packet-by-packet behaviour (they "zoom out" and represent data as a fluid
flow of information). Dynamically changing packet rates are modeled as ODE capturing averaged
behaviour. ODE models can get complex resulting in expressions that require numerical solutions.
The biggest advantage of this approach is that, contrary to packet-level models, the performance
of numerically solving fluid models grows linearly only with the number of nodes, but is (a priori)
independent of link speeds.
The first successful fluid models for packetized networks date back from the sixties with the
development of ARPANET [42]. Throughout the years, new models were proposed for different
protocols, network conditions, traffic patterns, congestion control techniques [4]. The dynamics of
congestion control and more specifically the TCP protocol is one of the most studied topics in the
fluid-flow literature as it represents more than 90% of Internet’s traffic today [31].
2.2. Hybrid Dynamic Systems 27

A prominent fluid-flow model of TCP is the one presented by Misra, Gong, and Towsley (MGT)
in [24], [43], [44]. They proposed a set of ODEs based on a stochastic Poisson processes to describe
the steady state throughput of a TCP connection over a network of routers implementing Random
Early Detection (RED). MGT showed empirically that equations approximate quite well real TCP
connections [43]. Another key strength of the MGT model is its simplicity. For example, the
congestion window and throughput can be modeled as a function of the packet loss rate and
round-trip time by the following ODE:

W (t)
di (t) = (2.1)
τi (t)
dWi (t) dt Wi (t)
= − µi (t − τi (t)) (2.2)
dt τi (t) 2
where d(t) is throughput (departure rate), W (t) is the TCP congestion window size, τ (t) is
the round-trip time, and µ(·) is the packet loss rate, with subindex i denoting the i-th flow in the
system.
Although the MGT model makes several assumptions like bulk long-lived TCP flows, routers
implementing RED, and represents only some TCP phases, it has been successfully used in different
contexts and extended with additional features throughout the years [10], [24], [45]–[47].
As one can observe from Equation (2.2), solving this model involves numerical integration.
Luckily for researchers that invest efforts in coming up with reasonable equations, there is a vast
literature for continuous system simulation to approximate ODEs [5], [48]–[50]. Most of these
ODE solvers result in discrete-time simulations as the time variable advances with discrete steps
to produce Taylor series approximations of the original continuous system.
Some tools that allow solving ODEs include Matlab [51], Modelica [52], libraries for specific
languages (e.g. GSL for C++ [53] or Python [54]) and more [55]. It is nevertheless not uncommon
to find custom implementations of the methods to incorporate specific features [10], [24].

Packet-level and fluid-flow approaches entail completely different set of tools, knowledge and
expertise. Mastering any of them requires considerable effort and the knowledge acquired using one
approach can rarely be used for the other approach. This comes from the fact that each approach
relies on different theoretical foundations in the discrete-event, discrete-time and continuous realms.

2.2 Hybrid Dynamic Systems


Systems Theory divides dynamic systems in three categories according to the evolution of variables
in time: 1) Continuous Time Systems where variables change continuously with time, 2) Discrete
Time Systems where variables change at regular time instants, and 3) Discrete Event Systems
where variables change at any instant but only a finite number of times in a finite interval.
Discrete systems can be directly simulated in a computer without manipulation while Con-
tinuous systems require numerical approximations, basically by discretizing the time variable [5],
[48]–[50].
Systems that combine continuous and discrete characteristics are called hybrid systems. Hybrid
mathematical models contain differential equations with variables that interact with other vari-
ables that evolve in a discrete fashion. This interaction between discrete and continuous domains
28 Chapter 2. Background

create discontinuities in the differential equations which are not easily handled by classic numerical
integration methods. The difficulty arises when integration algorithms need to detect the exact
instants at which discontinuities occur, and restart simulation from there on, with an important
increase in computational costs [5].
A similar problem occurs in numerical methods for Delayed ODEs, namely Delay Differential
Equations (DDE), where the evolution of variables depend on both the current and past times
(just like Equation 2.2). There, discontinuities in the derivatives are an intrinsic phenomenon.
To tackle these issues, special algorithms have been developed [56], [57] and hybrid systems with
DDEs combine both challenges [58].
Hereinafter, we will see some characteristics of time discretization methods and discrete event
approaches to showcase the importance of adopting a common framework for hybrid systems, and
ultimately for the unifying of packet-level and fluid-flow network simulation.

2.3 Classic Integration Methods


In most scientific fields real world phenomena can be modeled (either exactly or approximately) as
continuous time systems, and then represented by a set of Ordinary Differential Equations (ODEs).
In many cases it is hard or impossible to find analytical solutions for these models, in which case
numerical integration is used to simulate the continuous time system.
A dynamical system can be typically represented in the form of a state–space model:

ẋ = f (x(t), u(t)) (2.3)


x(t0 ) = x0 (2.4)

where x(t) ∈ Rn is the state vector, y u(t) ∈ Rw is the input parameter vector, x0 are initial
conditions, and t is the independent variable which represents the time. Each component xi (t) of
the state vector represents the trajectory of the i-th state variable as a function of time [5].
If the state equations do not contain discontinuities in fi (x(t), u(t)) nor in its derivatives, then
the solution xi will also be a continuous function. Moreover, the function can be approximated
with any degree of desired accuracy by the expansion of the Taylor series around any point in the
trajectory. These properties are the central basis for any integration method.
If we wanted to approximate the function xi in the instant t∗ using Taylor series and evaluate
the approximation at time t∗ + h, then the value of the trajectory can be described as follows:

dxi (t∗ ) d2 xi (t∗ ) h2


xi (t∗ + h) = xi (t∗ ) + h+ + ... (2.5)
dt dt2 2!
Replacing (2.4) into (2.5) the series is:

dfi (t∗ ) h2
xi (t∗ + h) = xi (t∗ ) + fi (t∗ )h + + ... (2.6)
dt 2!
Different integration algorithms vary in how they approximate the higher state derivatives, and
in the number of terms of the Taylor–Series expansion that they consider in the approximation.
2.4. The Discrete Event System Specification (DEVS) Formalism 29

There exists vast literature on multiple methods with different features that make each method
suitable for solving different type of problems. Just to name a few well know algorithms: Euler
[59], Runge-Kutta [60], DASSL [61], or for a more complete introduction to continuous system
simulation the reader can refer to [5].
All classical methods make a time discretization of the original ODE to simulate the system
within a computer. Algorithms are strongly based on the discrete advancement of time controlled
by a fixed (or adaptive) step (h in Equation (2.6)). Discrete representation of time makes it chal-
lenging to incorporate events that arrive in between consecutive steps at any (continuous) point
in time.

Previous efforts to combine fluid and discrete simulation of networks must struggle, in one
way or another, to accommodate the asynchronous nature of packets with the synchronous step h
embedded at the core of classical methods. We shall instead explore other avenues, in an attempt
to get rid of the synchronous time slicing paradigm, and relying on features offered by the DEVS
formal framework introduced next.

2.4 The Discrete Event System Specification (DEVS) For-


malism
DEVS [7], [62], [63] is a mathematical formalism for modeling and simulation based on general
systems theory, i.e. it reason about systems’ behavior and structure independently from any
specific application. DEVS atomic and coupled models encapsulate behavior and allow systems
to be built in a modular and hierarchical way. Since its original formulation many properties have
been proven, including homomorphism with many other methodologies (Finite State Automata,
Petri Nets, Grafcets, Statecharts, etc.) which can be represented by DEVS systems. This turned
DEVS into a commonly used formalism for modeling and simulation of most discrete systems,
including discrete time systems [63].
DEVS allows describing exactly any discrete system, and to approximate numerically continu-
ous systems with any degree of desired accuracy. The latter concept will be introduced in the next
section on QSS methods.
The formal specification of models provides tools for its analytic manipulation and offers inde-
pendence in choosing the programming language for practical implementations.
DEVS enforces a strict separation between modeling (model description, done by the modeler
to describe a new problem) and simulation (the execution of models). Several practical tools have
been developed for the modeling and simulation of DEVS-based systems [64]–[68]
As shown in Figure 2.2 (left), a DEVS model process an input event trajectory and, according to
its own internal state, produces an output event trajectory. An event represents an instant change
in some part of the system, and can be characterized by a value and the time of its occurrence.
The value can be a number, a word, or in general any element from an arbitrary set X.
A trajectory is defined by a sequence of events. It takes the value φ (or NO EVENT) in all
time instants except in the instants where there are events. In this instants, the trajectory takes
the value corresponding to the event. Figure 2.1 shows an event trajectory which takes values x2
at time t1 , x3 at t2 , and so on.
30 Chapter 2. Background

Figure 2.1: Example of a generic discrete-events trajectory

DEVS models are described as a hierarchical composition of atomic models (Ms) and coupled
models (CMs). CMs define system structure (interconnections between Ms and other CMs models),
whereas Ms define dynamic behaviors. Mathematically they are defined by the tuples as shown in
Figure 2.2.

Figure 2.2: Basic DEVS atomic models (left) and coupled models (right)

2.4.1 DEVS Atomic Models


Formally, a DEVS atomic model M is defined with the following structure:

M = {X, Y, S, δint , δext , λ, ta } (2.7)

where
• X is the set of all possible input event values,

• Y is the set of all possible output event values,

• S is the set of internal states,


2.4. The Discrete Event System Specification (DEVS) Formalism 31

• δint , δext , λ, and ta are functions that define the dynamics (behaviour) of the model

Figure 2.3 shows illustrative dynamics of a DEVS atomic models along with its main functions.
Each possible model state, s ∈ S has an associated lifetime defined by the time advance function
ta : S → R0+ . When the model is in the state s = s1 , at time t1 = ta(s1 ) units of time the system
performs an internal transition evolving into a new state s2 = δint(s1 ). δint : S → S is called the
internal transition function. In this case, at the same instant, an output event is produced with
value y1 = λ(s1 ). The function λ : S → Y is called the output function.
When an atomic model receives an input event x ∈ X a transition is triggered that instantly
changes the model state to s4 = δext(s3 , e, x1 ), where s3 is the model state when the input event
arrives, and e is the elapsed time since the last state transition time (with e ≤ ta(s3 )). The function
δext : S × R0+ × X → S is called the external transition function.

Figure 2.3: Example events/states trajectories for a DEVS atomic model.

2.4.2 DEVS Coupled Models


Coupled DEVS models, shown in Figure 2.4 (right) define the structure of the system (intercon-
nections between coupled and atomic models).
Formally, a DEVS coupled model is defined by the following structure:
32 Chapter 2. Background

CM = {X, Y, D, {Md }, {Id }, {Zi,d }, Select} (2.8)


where:
• X and Y are the set of input and output values for the coupled model
• D is the set of references to the internal components such that for each d ∈ D, Md is a DEVS
model (atomic or coupled)
• for each d ∈ D ∪ {N }, Id ⊂ (D ∪ {N }) − {d} is the set of models influencing the subsystem d
• For each i ∈ Id , Zi,d is the translation function, where


X → Xd if i = N
Zi,d : Yi → Y if d = N (2.9)
Yi → X otherwise

• Select : 2D → D is tie breaking function between two simultaneous events and must satisfy
Select(E) ∈ E, with E ⊂ 2D where 2D is the subset of components that produce the
simultaneous events.

2.4.3 DEVS Abstract Simulator


The DEVS Abstract Simulator (or simulation algorithm, or scheduler) for atomic models M and
for coupling models CM is universal, unambiguous, easy to implement, and independent of pro-
gramming languages.
The DEVS abstract simulator defines how the simulation advances. It can be seen as a univer-
sal scheduler for DEVS. Atomic models are simulated by Simulators, while Coupled models are
simulated by Coordinators. Each simulator or coordinator has a local variable tn which indicates
the time when its next internal transition will occur. At simulators, tn is calculated using the
time advance function of the corresponding atomic model. At coordinators, tn is calculated as the
minimum tn among its children [66].
The basic idea for the simulation of a coupled DEVS model can be described by the following
steps:
1. Look for the atomic model that, according to its time advance and elapsed time, is the next to
perform an internal transition. Call it d* and let tn be the time of the mentioned transition.
2. Advance the simulation time t to t = tn and execute the internal transition function of d*.
3. Propagate the output event produced by d* to all of the atomic models connected to it
executing the corresponding external transition functions. Then, go back to step 1.
The simulators and coordinators of consecutive layers communicate with each other with mes-
sages as shown in Figure 2.4. The coordinators send messages to their children so they execute
the transition functions. When a simulator executes a transition, it calculates its next state and,
when the transition is internal, it sends the output value to its parent coordinator. In all of the
cases, the simulator state will coincide with its associated atomic DEVS model state.
2.5. The Quantized State Systems (QSS) Methods to Solve ODEs in Hybrid
Systems 33

Figure 2.4: Hierarchical simulation of DEVS models and DEVS abstract simulator

2.4.4 Vectorial DEVS Models


There are many extensions and specializations of DEVS that tackle different needs (e.g. Cell-
DEVS [69] for cellular automata, PDEVS [70] for modeling parallelism, etc.). In particular we are
interested in Vectorial DEVS (VDEVS) [71] that allows representing large arrays of systems with
a compact representation. A VDEVS model is an array of quasi identical classic DEVS, which
may differ in their initial parameters. Formally the structure of a vector model is defined by:
V D = {N, Xv , Yv , P, {Mi }}, where N is the vector dimension, Xv is the set of input events vector,
Yv is vector set of output events, P is the set of parameters and each Mi is a classic DEVS atomic
model. For the interaction between vectorial and non-vectorial, scalar to/from mapping models
are available.

2.5 The Quantized State Systems (QSS) Methods to Solve


ODEs in Hybrid Systems
A relatively young family of numerical methods, called Quantized State System (QSS) methods,
was developed based on state quantization [5]. QSS methods replace the time discretization of
classic numerical integration algorithms by the quantization of the state variables.
These methods are based on the idea, originally proposed by Bernard Zeigler, that continuous
systems can be approximated by discrete events systems under the DEVS formalism. Later,
Kofman evolved the idea and developed proofs on stability, convergence and error bounds, leading
to a formal definition of the QSS family [8].
Given a time invariant ODE in Equation (2.4), the first order method QSS1 [8] solves the
following approximate Quantized State System:

ẋ = f (q(t), u(t)); (2.10)

where q(t) is the quantized state vector related to the state vector x(t) by a hysteresis quanti-
zation function:
34 Chapter 2. Background

xi (t) if |qi (t− ) − xi (t)| = ∆Qi



qi (t) = (2.11)
qi (t− ) otherwise
where qi (t0 ) = xi (t0 ), t− is the left limit of instant t and ∆Qi is a parameter called quantum.
Following this definition, the quantized value qi (t) only changes when it differs from xi (t) by a
magnitude ∆Qi and after each change it happens that qi (t) = xi (t). Thus, qi (t) follows a piecewise
constant trajectory and approximates xi (t) with a maximum error of ∆Qi as shown in Figure 2.5
left.
Figure 2.5 right, shows an example of quantification qi (t) for an arbitrary trajectory xi (t)
following Equation (2.11)

Figure 2.5: Quantized state function with hysteresis from Equation (2.11)(left). Quatification with
hysteresis  = ∆Q (right) [8]

Based on the idea of QSS1, a second-order (QSS2) and third-order (QSS3) methods were devel-
oped replacing the piecewise constant approximations by piecewise linear and piecewise parabolic
trajectories ones, respectively [72]. Similar methods were developed for stiff and marginally stable
systems: Backward QSS (BQSS), Centered QSS (CQSS), and Linearly Implicit stiff QSS methods
of orders 1 and 2 (LIQSS1 and LIQSS2) [73], [74].
As an example of the application of the QSS2 method, Figure 2.6 shows a QSS Plot for the
approximation of a dynamic system with the different quantized trajectories. As it is a second-order
accurate method, the quatization produces a first-order approximation, and hence the trajectories
of q(t) are piecewise linear. This can be seen in the first row of the plot with the variables x(t)
and q(t) superimposed. The plot is shown for a generic state variable x3 that represents the third
element in the state space vector x in Eq. 2.10. The dots denote the (asynchronous) instants
where the method decides that the internal x(t) needs to be updated, either because an external
update value arrives to the integrator or because of the internal error control mechanism. The
second row in the plot shows the approximation error (difference between x and q)). The following
rows show the evolution of internal error-control variables at the integrator: the quantum ∆q,
relative quantum ∆qrel . The last three rows show the coefficients of orders zero, one and two of
the polynomial approximation of x(t).
2.5. The Quantized State Systems (QSS) Methods to Solve ODEs in Hybrid
Systems 35

Figure 2.6: QSS Plot for the approximation of the state variable x(t) with the quantized trajectory
q(t) (top row), along with internal variables for error control (second to sixth rows)

2.5.1 Properties of QSS Methods


Several important properties has been demonstrated for QSS methods. QSS methods can simulate
continuous dynamic systems from Equation (2.4) with the following properties:
• Convergence was proven for QSS methods in [8]. This shows that when ∆Q are chosen
sufficiently small, the solutions of Equation (2.10) approach the solutions of Equation (2.4).
• Stability: QSS preserves numerical stability without implicit formulas. The quantification
perturbation is bounded by the hysteresis ∆Q and trajectory described by QSS methods is
called ultimately bounded [75]. The non-linear stability can be studied by means of Lyapunov
function [8]. A sufficient condition for stability is that the function f be continuous and
continuously differentiable.
• Global error bounds: A global error bound for linear time–invariant (LTI) systems was
studied in [76], [77]. The error in the QSS simulation will never differ from the analytical
solution by a more than a finite value proportional to the quantum ∆Q. An arbitrarily
small simulation error can be achieved when a sufficiently small ∆Q is used. Moreover, the
error bound does not depend on the initial condition and stay constant during the simulation
(which is not true for discrete–time methods).
• Asynchronous time advance: each variable in the quantized system will update its value
independently of the state of other variables. Contrary to conventional methods where
36 Chapter 2. Background

all variables are calculated in each integration step, in QSS each variable is updated only
when needed (|qi (t− ) − xi (t)| > ∆Qi ) at different time instants. This provides significant
performance advantages for system with sparse matrices.

• Dense output: output is represented by piecewise polinomial trajectories according the QSS
method order. Figure 2.7 shows how QSS1 generates piecewise constant trajectories, QSS2
generate piecewise linear trajectories, and QSS3 generates piecewise parabolic trajectories.
This allows for a straightforward output interpolation which is ensured to remain inside the
theoretical error bound. This is an interesting and important characteristic very useful for
asynchronous and hybrid systems.

• Efficient simulation of discontinuities: given the asynchronous and discrete-event nature


of QSS methods, each discontinuity is handled naturally and efficiently. On the contrary,
discrete-time methods require a special procedures to perform a step at the exact moment
when an input change occurs. This provides QSS methods with intrinsic advantages to
simulate discontinuous systems [78].

(a) Zero order, piecewise constant (b) First order, piecewise linear (c) Second order, piecewise
quantizer. quantizer. parabolic quantizer.

Figure 2.7: Input and output trajectories in the quantization procedure for different QSS orders
of accuracy [5], [72].

2.5.2 Relationship Between QSS and DEVS


QSS methods result in discrete-event approximations of ODEs. Figure 2.8 shows a block diagram
representation of Equation (2.10). The blocks F1 , ..., Fn represent static functions that calculate
the right-hand side of the equation. The HQI blocks represent the hybridized quantified integrators
composed of an integrator block and a quantification with hysteresis block. F1 , ..., Fn blocks receive
input piecewise trajectories qi and uj and calculate an output continuous piecewise trajectory xk .
Since the q(t) follow piecewise constant trajectories, it results that the state derivatives x(t) also
follows piecewise constant trajectories. Also the systems have piecewise constant input and output
trajectories that can be represented by sequences of events. Consequently, equivalent DEVS models
can be found for these systems, called static (or memoryless) functions and quantized integrators
[5].
2.6. PowerDEVS Simulator 37

The QSS approximation can be exactly simulated by a DEVS model consisting of the coupling
of quantized integrators, static functions, and signal sources [8], [79].

Figure 2.8: Block diagram representation of QSS the state system Equation (2.10)

2.6 PowerDEVS Simulator


PowerDEVS [66] is a general purpose discrete-event simulator that implements the DEVS mathe-
matical formalism. All QSS integration methods are implemented in PowerDEVS and was specif-
ically conceived for the simulation of hybrid system.
PowerDEVS provides a graphical user interface (GUI) similar to Simulink that allows manipu-
lating block diagrams that represent DEVS atomic and coupled models. Atomic models are defined
in C++ classes following a well-defined interface (virtual methods for each DEVS function). Pow-
erDEVS GUI provides a basic editor for atomic models where different DEVS transitions are shown
in different tabs. DEVS coupled models are defined using drag&drop and interconnecting models.
The complete visual model definition is transparently translated into C++ and compiled together
with the simulation core to produce an executable of the model.
PowerDEVS provides a wide library of pre-developed blocks that allow modeling continuous and
hybrid systems without coding C++ atomic model (integrators, mathematical function, models
for handling discontinuities, sources, etc.). A user without knowledge on DEVS or QSS can specify
continuous system equations as block diagrams using the GUI. Figure 2.9 shows a screenshot of
PowerDEVS GUI with the continuous block model library and the coupled model of an inverted
pendulum.
In particular for solving network fluid models is the development of QSS Delay Differential
Equations (DDEs) [80] which allows dynamically changing delay (such as the one experienced by
38 Chapter 2. Background

network traffic) to be applied by reusing an already implemented DEVS atomic model.

Figure 2.9: PowerDEVS GUI. Model library (left). inverted pendulum model (right).

In the last decade PowerDEVS has been being extended with new several domain-specific
model libraries. In particular to include data network models, and has been used for packet-level
simulation (queues, servers, traffic generators, an implementation of TCP, etc.) [81].
Resumen: Antecedentes Preliminares

La literatura distingue tres enfoques para estudiar el rendimiento de redes de datos: modelado
matemático (o analítico) [28], [29], modelado y simulación [3] y pruebas en los dispositivos reales
[2]. Nos centraremos en las técnicas de modelado de simulación al ser flexibles en escenarios de
redes a gran escala.
Existen muchas herramientas de simulacion paquete a paquete (ej., ns-2 [18], ns-3 [19], OM-
NET++ [20], OPNET [21]). Pocos simuladores son compatibles con especificaciones formales. El
tiempo de simulación aumenta (al menos) de forma lineal con el número de nodos y la velocidad
de enlace, lo que los hace inadecuados para redes grandes de alta velocidad.
Un modelo prominente de aproximaciones fluidas basado en ODEs es el presentado por Misra,
Gong y Towsley (MGT) [24], [43], [44]. Las ODE requieren soluciones numéricas cuyo desempeño
crece linealmente unicamente con el número de nodos, y es (a priori) independiente de la velocidad
de los enlaces. Algunas herramientas que permiten resolver ODEs incluyen Matlab [51], Modelica
[52], bibliotecas para idiomas específicos (ej., GSL para C ++ [53] o Python [54]) y es común
encontrar implementaciones personalizadas de los métodos clásicos [10], [24].
Clásicamente, cada enfoque requiere diferentes conocimientos y herramientas, lo que hace que
los expertos en redes adhieran unicamente a uno de ellos. Esto se debe a que cada enfoque se basa
en diferentes paradigmas de eventos discretos, tiempo discreto y continuo.
Los sistemas dinámicos pueden dividirse en tres categorías según la evolución (discreta o con-
tinua) de sus variables en el tiempo. Los sistemas continuos requieren aproximaciones numéricas
que tradicionalmente discretizan la variable temporal [48]–[50]. Los sistemas que combinan car-
acterísticas continuas y discretas se denominan sistemas híbridos y las discontinuidades en las
ecuaciones diferenciales no se manejan fácilmente en los métodos tradicionales [5].
DEVS [7], [62], [63] es un formalismo matemático para el M&S que permite la construcción
modular y jerárquica de los sistemas utilizando modelos DEVS atomicos(comportamiento) y
acoplados (descripción estructural). DEVS permite describir exactamente cualquier sistema dis-
creto y aproximar numéricamente sistemas continuos con cualquier grado de precisión deseada. Los
sistema de estado cuantificado (QSS) son una familia de métodos numéricos [5], [8], [72] que reem-
plaza la discretización del tiempo por la cuantización de las variables de estado. Las propiedades
probadas de QSS incluyen convergencia [8], estabilidad [8], error globalmente acotado [77], avance
de tiempo asincrónico, salida densa y simulación eficiente de discontinuidades [78]. Los métodos
QSS pueden representarse mediante un sistema a eventos discretos DEVS [79].
Varias herramientas prácticas implementan el formalismo DEVS [64]–[68]. PowerDEVS [66] es
un simulador de propósito general que implementa DEVS y todos los métodos de integración QSS.
Se concibió específicamente para la simulación de sistemas híbridos y es la herramienta elegida
para el desarrollo de esta Tesis.

39
Chapter 3

Motivating Case Study and Methodology

All models are wrong but some are useful

George Box(1978)

Robust engineering methodologies for product life cycle control have proven to be a cornerstone
in modern software/hardware development projects.
Simultaneously, various modeling and simulation (M&S) techniques have been increasingly
adopted in complex systems design, particularly in scenarios where it is difficult to predict system’s
behavior as changes are introduced.
In this Chapter we introduce the complex networked system that motivates our developments
of simulation models all across the Thesis. To set the context, a brief description of the ATLAS
experiment at the LHC accelerator in CERN is presented.
The Thesis comprises a M&S project spanning ∼ 5 years of development of simulation models
for varied purposes and different contexts within the LHC schedule (Run2, LS2 and LS3). Thus,
a well-defined methodology was required in order to keep a coherent evolution of simulation tools
and simulation models, while keeping a focus on the project’s goals.
A salient feature of DEVS is its strict separation between a model’s definition (the model ) and
an algorithm capable of simulating such model (the simulator ). While a technical introduction to
DEVS was presented in Chapter 2, now we leverage DEVS properties from a methodological point
of view. DEVS can also be a convenient framework to organize phases such as systems analysis,
experimental frame definition, model-to-simulator verification, and model-to-system validation.
We present a custom DEVS-based methodology for M&S-driven engineering projects. It in-
tegrates software development best practices that are further tailored to a large-scale networked
data acquisition system in the ATLAS particle detector at CERN [1]. This project poses M&S
challenges from several viewpoints, including system complexity, tight delivery times, the quality
and flexibility of the developed models and tools, interdisciplinary communication of results to
collaborators (mostly scientists). The methodology presented here is used throughout the Thesis
for the development of all models, tools and methods.

The development of simulation software shares some characteristics with classic software de-
velopment but requires the inclusion of specific M&S concepts.

40
3.1. Case Study Scenario: The ATLAS Data Acquisition Network at CERN 41

3.1 Case Study Scenario: The ATLAS Data Acquisition Net-


work at CERN
Several developments in the Thesis are motivated by the real case study of the triggering network in
the ATLAS experiment at CERN. We provide a brief contextual introduction to the main facilities
of the accelerator and detector. More details are introduced in subsequent chapters as required,
while full details can be found in the technical references.

3.1.1 The Large Hadron Collider at CERN


The Large Hadron Collider (LHC) [82] is the world’s largest particle accelerator with 27 kilometers
in circumference located at the European Organization for Nuclear Research (CERN). The LHC
was designed to accelerate bunches of particles (protons or ions) up to an energy of 14 TeV and
a luminosity of 103 4cm−2 s−1 . Collisions occur every 25 ns near large detectors, including ATLAS
[1], CMS [83], ALICE [84], and LHCb [85] as depicted in Figure 3.1.

Figure 3.1: The LHC accelerator facilities.


42 Chapter 3. Motivating Case Study and Methodology

The LHC operates in alternating cycles of data-taking (usually called "Runs") and upgrade
periods (usually called "Long Shutdown (LS)") as shown on Figure 3.2. Currently the LHC
schedule is planned until 2035 and studies for Future Circular Colliders are ongoing to take the
collision energy to 100 Tev in the following years [86].

Figure 3.2: The LHC upgrade schedule and associated luminosity [87].

3.1.2 The ATLAS Detector


The ATLAS collaboration is composed of more than 4000 scientist and students from 38 countries
and 181 universities and laboratories around the world [88]. The collaboration constructed the
ATLAS (A Toroidal LHC ApparatuS) detector [1], a general purpose particle detector where
collisions generate very high energy enabling for the search of novel physical evidences (Higgs
boson, extra dimensions, dark matter, etc.). ATLAS is composed of different detector technologies,
referred to as sub-detectors and depicted in Figure 3.3, which identify a full range of particles which
may be produced during collisions and help in the reconstruction of electrons, photons, muons,
tau, leptons, jets and missing energy from undetected neutrinos.
Using tens of millions of electronics channels particle induced signals that are registered in the
detector and digitized for further analysis. Data corresponding to each bunch collision is called an
Event.
The ATLAS experiment has a data-intensive program which relies on having ubiquitous, high-
performing networks to enable its distributed infrastructure [6]. Generated data is distributed to
thousands of physicists all over the world through networks that provide 1 to 10 to 100 Gbps of
bandwidth. Figure 3.4 shows ATLAS wide-area network usage is rapidly increasing over the years
and exemplifies the importance of networking for ATLAS experiment.
3.1. Case Study Scenario: The ATLAS Data Acquisition Network at CERN 43

Figure 3.3: ATLAS particle detector at CERN

Figure 3.4: ATLAS wide-area network use with a trend-line showing a 164% increase [6]
44 Chapter 3. Motivating Case Study and Methodology

3.1.3 Trigger and Data Acquisition System


The raw amount of information produced by the ATLAS detector exceeds 60 Terabytes/second.
To assimilate this throughput, ATLAS uses a sophisticated and highly distributed filtering system
called the Trigger and Data Acquisition (TDAQ) [89]. Based on specialized physics algorithms,
the TDAQ system decides in quasi real-time if each Event should be permanently stored for future
off-line analysis or if it is uninteresting and can be safely discarded.
The TDAQ system is composed of two levels which reduce the event rate from the design bunch-
crossing rate of 40 MHz down to an average event recording rate of about 1 kHz as shown in Figure
3.5. The First Level Trigger (L1) filters Events from an initial raw rate of 40 million Events/second
down to a filtered rate of 100 thousand Events/second. L1-accepted Events are temporarily stored
in a ReadOut System (ROS computer farm) in the form of data structures called Fragments, and
then accessed by a second level filter called the High Level Trigger (HLT). At the HLT physics
algorithms reanalyze the fragments (this time around with a different granularity) retaining only
1 thousand “interesting” Events/second.
The TDAQ system and its HLT-ROS Data AQuisition (DAQ) network is our real System
Under Study, for which we will describe modeling and simulation techniques and processes.

Figure 3.5: The ATLAS TDAQ system in Run 2.

3.2 Modeling and Simulation-Driven Methodology


3.2.1 Context and Requirements
During LS1, the TDAQ HLT filtering farm was subject to changes in hardware and control algo-
rithms that affect network topology and throughput. Predicting the impact of such changes isn’t
straightforward. Thorough designs and benchmark studies on system components provide more
3.2. Modeling and Simulation-Driven Methodology 45

confidence, but they require access to the hardware in advance. In the end, testing the system as
a whole happens only at the final integration phase of a project.
The full TDAQ system was available for testing only about one out of every six weeks (during
scheduled technical runs), which delays the testing on new control algorithms that are continuously
improved but can’t be fully validated until the full system is in place.
Table 3.2 lists the resulting requirements obtained during system analysis meetings. Moreover,
these initial requirements are likely to change dynamically throughout a project’s lifetime, with
varied experts having different analysis requirements on same components.

Requirement Goal
Evaluate candidate changes for the network Early risk assessment
and control algorithms before their commis-
sioning
Define in advance the best set of tests to be Harnessing the test window to focus on the
performed on the real system, during scarce most relevant questions
windows of availability
Flexibility for choosing the level of detail/ac-
Dynamically adapt to different and complex
curacy with which the evaluations are obtainedmodifications that need to be assessed, and to
schedule changes
Table 3.2: TDAQ requirements obtained during system analysis meetings

3.2.2 DEVS-Based Iterative Methodology


Figure 3.6 depicts the proposed iterative process-based methodology which implements an engi-
neering strategy driven by modeling and simulation. The approach is partially inspired by the well
known Balci-Nance life cycle reference for simulation projects [90] in combination with Zeigler’s
view on Experimental Frame [7].
At the methodology’s core, the System, Model and Simulator entities are strictly separated
yet formally related by the DEVS framework. Each of these entities is related with its own
experimental frame (EF), parameters (θ), and experimental results (λ). The real (or "source")
System is experimented under a system experimental frame (EFS ), with questions encoded in the
form of system parameters θS that define experimental conditions. Experimental results relevant
to the original questions are stored in a system behavior database λS .
Every new DEVS Model is built for a pair {System, EFS } according to a modeling relation
and guided by selected homomorphisms/isomorphisms. The model experimental frame (EFM )
also allows for questions about model attributes (e.g. coupling density, model topology, types of
variables -discrete, continuous-, etc.) The EFM has no access to the real system and is independent
of any simulation exercise. Model parameters θM are used to query model attributes and answers
stored in the model database λM .
A DEVS Simulator reads a DEVS Model and produces an output trajectory by obeying the
model’s dynamics (in short, a DEVS model is simulated). Its most common realization is a com-
puter program, usually referred to simply as a simulator, which is constructed, adapted, and
46 Chapter 3. Motivating Case Study and Methodology

maintained to read and compute DEVS models efficiently within their EFM . This establishes
a simulation relation. The compute experimental frame (EFC ) defines new questions and pa-
rameters θC for experimenting with (simulating) the computable model. It also hosts simulation
results in a compute behavior database λC . The validation relationship lets us relate back to the
original real system to validate correctness (λS versus λC ) or to perform scans over EFS due to
unexpected/surprising observations discovered in the EFC .

Figure 3.6: Modeling and Simulation-Driven Engineering. Methodology diagram based on the
DEVS formal framework. Iterative cycles and incremental phases

3.2.2.1 Cycles and Phases


The flow across the DEVS formal framework follows a System → Model → Simulation path. The
flow of tasks is organized in three cycles depicted in Figure 3.6:
• Build (the model) in blue: starts with observation and measurement of the system. Its
3.2. Modeling and Simulation-Driven Methodology 47

objective is to provide quality models that, once simulated, will exhibit an adequate degree
of validation against the original system

• Hypothesis (on the system) in orange: The hypothesis cycle exercises several candidate
changes on the model, which are candidates to be applied onto the system. Its goal is to
find improvement opportunities for the system when it is still unavailable or when direct
experimentation is too expensive or inaccessible.

• Explore (simulation results) in green: The explore cycle starts with analyzing the large
amounts of information produced by simulations; its goal is to discover properties and cor-
relations unthought-of during the experimentation phases.

Cycles need not occur in any specific order (although a build cycle is usually required at the
beginning of a project).
Two parallel and cooperative phases are defined for each evolution throughout a cycle:

• System study phase: drives progress according to questions about the system under study

• Tools development phase: seeks to improve the supporting software algorithms and inter-
faces, leveraging modeling, simulation, and analysis capabilities

This approach leads to a models that reproduce relevant behaviors of the real system within
reasonable simulation times: less relevant dynamics are kept out of the model (e.g. details of
the network physical layer). The relevancy and reasonability aspects change dynamically with
each new question to be answered and each new project context (e.g. delivery times, available
computing power).
The methodology also offers a guideline for development phases of the underlying M&S software
tools; new features are added to the tools at specific phases, responding to specific needs, framed
within unambiguous cycle goals.

3.2.3 Relationship with Existing Techniques and Methods


Modern software development processes and methodologies contributed by the software engineer-
ing discipline rely on frameworks that control software life cycles. Popular ones are test-driven
development (TDD), extreme programming, and the Rational Unified Process (RUP). Some of
these foster practices such as pair programming or code reviews, whereas others propose iterative
and incremental cycles, with frequent deliveries focused on adding value quickly.
The proposed methodology shares some aspects with other approaches [2]. Proposed phases
can be related for example to the well-known inception, elaboration, construction, and transition
cycles in RUP [91]. However, none of the aforementioned methods include the formal M&S aspects
provided by DEVS: strict separation between modeling formalism, abstract simulation mechanism,
and code implementation (of both model behavior and simulation engines). The proposed approach
is inspired on the Balci-Nance life cycle reference for simulation projects [90]. This gives the
advantage of independence between experimental frames for the real system, the model, and the
simulator, straightforwardly propagating enhancements in any of these three areas to the others.
In typical software-based projects it is unusual to develop the base tools themselves while executing
the project. However, in M&S-driven scientific projects, the base tools for modeling, simulation,
48 Chapter 3. Motivating Case Study and Methodology

and data analysis are crucial assets that impose their own requirements alongside those of the
model itself. The proposed methodology naturally satisfies this need.
Large sets of simulation results can support data-driven hypotheses and predictive analytics. A
well-structured simulation database together with reusable data analysis libraries can systematize
different layers of information aggregation, enabling stratified levels of analyses. The proposed
methodology fosters this approach.

3.3 Other Methodological Considerations


We adopted a bottom-up approach, focusing first on basic low-level behaviour of the network and
protocols’ control logic, including their standalone validation. Then, basic models are integrated
into a larger system. This approach allows for emergent behaviour at the system level, which can
validated against the real system. This follows the conceptual modeling guidelines in [92], starting
with a simple model and add details progressively until sufficient accuracy is obtained.
The models and tools developed in this work are, in a first approximation, not intended to
compete with well-established network simulation tools in terms of number of features or variety
of available out-of-the-box libraries (e.g. implementing the many flavours of the TCP protocol).
Our packet-level models were designed with as much level of detail as required by the case
studies of the TDAQ network, prevailing the simplest models that meet the project’s objectives,
over unnecessarily more complex alternatives. The focus is set on demonstrating how a well-
defined M&S formalism and methodology can be used for detailed packet-level simulation, setting
the ground to further comparisons against fluid flow approaches (Chapter 5) and finally evolve
into hybrid network simulation (Section 5.6) .
As stated in Chapter 2, it is recommended to use at least 2 performance evaluation approaches
[2]. In this work we developed new simulations models which are compared and contrasted against
analytic models (Section 4.5), other well-know simulators (Section 4.3.2), and real-world test sce-
narios (Section 4.4.3).
Resumen: Caso de Estudio y Metodología

En este Capítulo se presenta el caso de estudio real de la red de filtrado en el experimento ATLAS
en CERN y la metodología utilizada a lo largo de la Tesis.
El Gran Colisionador de Hadrones (LHC) [82] es el más grande acelerador de partículas, ubicado
en la Organización Europea para la Investigación Nuclear (CERN). El LHC acelera conjuntos de
partículas (protones o iones) hasta una energía de 14 TeV y una luminosidad de 1034 cm−2 s−1 . Las
colisiones se producen cada 25 ns cerca de grandes detectores, incluido ATLAS (A LHC ApparatuS
[1], un detector de partículas de propósito general compuesto por sub-detectores que identifican una
amplia gama de partículas producidas durante las colisiones. La señales inducidas por las partículas
se digitalizan produciendo una cantidad de información que supera los 60 Terabytes/segundo. Para
asimilar este información, ATLAS utiliza el sistema altamente distribuido de Filtrado y Adquisición
de Datos (TDAQ) [89] para filtrar los datos en tiempo real.
El sistema TDAQ se compone de dos niveles. El primer nivel (L1) utiliza componentes de
hardware para filtrar los datos hasta ∼ 160 Gigabytes/segundo y los almacena temporalmente el
Sistema de ReadOut (granja de computadoras ROS). El segundo nivel (HLT), accede a los datos
almacenados de forma selectiva según una serie de algoritmos físicos. El sistema TDAQ y su red
HLT-ROS es nuestro sistema en estudio rel (ver Figura 3.5, para la cual se describieren técnicas
y procesos de modelado y simulación. Otros modelos de simulación del sistema TDAQ que se
tomaron en consideración incluyen [93]–[98]
La figura 3.6 describe la metodología propuesta basada en procesos iterativos que implementa
una estrategia de ingeniería dirigida por M&S. El enfoque está parcialmente inspirado por el ciclo
de vida de Balci-Nance [90] en combinación con la visión de Zeigler de marco experimental [7].
En el núcleo de la metodología, las entidades Sistema, Modelo y Simulador están estrictamente
separadas pero relacionadas formalmente bajo el marco de DEVS. El flujo de tareas se organiza en
tres ciclos: Construcción (del modelo); Hipótesis (en el sistema); Exploración (de resultados
de simulación). Se definen dos fases paralelas y cooperativas: fase de estudio del sistema y fase
de desarrollo de herramientas.
Este enfoque conduce a modelos que reproducen los comportamientos relevantes del sistema
real dentro de tiempos de simulación razonables: las dinámicas menos relevantes se mantienen
fuera del modelo (por ejemplo, detalles de la capa física de la red). Los modelos paquete a paquete
se diseñan con el nivel de detalle requerido por el caso de estudio de la red TDAQ, prevaleciendo
los modelos más simples que cumplen con los objetivos del proyecto, en lugar de alternativas
innecesariamente más complejas. Adoptamos un enfoque de abajo hacia arriba, siguiendo las
pautas del modelado conceptual en [92], comenzando con un modelo simple y agregando detalles
progresivamente hasta obtener la precisión suficiente. Este enfoque permite un comportamiento
emergente a nivel del sistema, que puede validarse contra el sistema real.

49
Chapter 4

Packet-Level Network Simulation

Nobody trusts a simulation result, except the person who made


it. Everybody trusts an experimental result, except the person
who made it

4.1 Introduction and motivation


Network simulation is an inexpensive and reliable instrument to develop and tests ideas when either
analytical models or experimental approaches are not available [2]. For the ATLAS experiment,
networks are essential to the distributed computing model and trigger installations. End-to-end
network performance and network problems have a significant impact on the ability of ATLAS
physicists to reach their scientific goal in a timely manner [6].
The motivating case study in this Chapter, briefly introduced in Section 3.1, is the Trigger
and Data AcQuisition (TDAQ) network in the ATLAS experiment at CERN. Simulation models
provide extra means for the understanding of network size requirements, and the highly distributed
algorithms supporting its data flow.
This Chapter describes the bottom-up development of a packet-level simulation model based on
the DEVS formalism. We start with a TCP protocol model, then models supporting the construc-
tion of network topologies, and later the representation of TDAQ applications. The DEVS-based
simulation results are compared aginst selected TDAQ measurements. Finally, load balancing
strategies are studied through packet-level simulation and compared against analytic results from
queuing theory. The best performing load-balancing scheme in terms of service time is implemented
in the context of the TDAQ system, first in simulation and then real world network.
Results show that our packet-level simulations are credible, setting the ground to evolve into
fluid approximations relying on the same DEVS-based technology, which will be the subject of the
next Chapter 5.

4.2 Preliminaries and Related Work


We provide technical definitions (for congestion control and scheduling) and application context
(for the ATLAS TDAQ Network at CERN) as relevant for understanding the new packet-level

50
4.2. Preliminaries and Related Work 51

models developed, which will be presented in Section 4.3. Also concepts of queuing theory for load
balancing are introduced, as required for the results presented in Section 4.5.2 for this domain.
The reader knowledgeable with any of these topics can skip them at convenience.

4.2.1 Relevant Network Congestion Control Mechanisms


4.2.1.1 Transport Control Protocol (TCP)
The Transport Control Protocol (TCP) [RFC793] sits on layer 3 of the Open Systems Intercon-
nection model (OSI) [99]. The goal of TCP is to guarantee reliable and ordered delivery of packets
for each connection, detecting possible data losses and re-transmitting packets when necessary.
TCP also provides congestion control mechanisms, in an attempt to enforce a fair sharing of the
available bandwidth among competing connections while avoiding a congestion collapse. There is
great interest, both practical and academic, in studying TCP through modeling and simulation as
it controls approximately 90% of the total internet traffic [100].
Congestion control in TCP is considered as a distributed control algorithm as it operates
on the sender and receiver ends, in principle ignoring the underlying network complexity. Each
TCP connection shares and competes for limited resources (e.g. link bandwidth) with other data
traffic not necessarily controlled by TCP. The first basic congestion control was introduced in
TCP in 1988 [101], defining that the sender would sense the network congestion state based on
acknowledgement packets (ACK) sent by a receiver notifying the successful reception of packets.
When buffers along the network get full, new incoming packets get discarded which will in turn
be sensed as a congestion signal at the sender end: either by receiving an ACK with a sequence
number already received before (duplicated acknowledge, DUP-ACK), or after a retransmission
time elapses without receiving new ACKs (retransmission time out, RTO). Many variants of this
approach were proposed as years passed, leading to a wide range of TCP versions available today.
A summary of most of the RFC documents defining TCP and its extensions can be found in [RFC
7414]. Here we introduce the basic mechanisms available in most TCP versions.
TCP uses a sliding window mechanism to determine which packets will be sent to the network
[102]. Packets are assigned a unique consecutive sequence number which the receiver uses to
identify missing packets and send ACKs for the latest received ordered packet. Both ends track
the current sequence number SEQs , SEQr and a window size Ws , Wr , respectively for the sender
and receiver. The sender buffers application data to guarantee there the are never more in-flight
packets (sent and not ACKed) than min(Ws , Wr ). Upon processing a packet (in the receiver) or
receiving an ACK (in the sender) the window slides by one packet: in the sender the ACKed packet
is removed from the buffer and the next packet in the queue can be sent.
TCP versions distinguish different phases during a established connection and in each phase
the congestion window size Ws is updated according to diverse criteria. The TCP-Reno [103] state
machine for the transition between phases is show in Figure 4.1. Some of the common phases and
window update modes found in TCP version are:

• Slow Start (SS): This is generally the starting phase where for each ACK received Ws is
increased by one packet, effectively allowing two new packets to be transmitted. This quickly
increases Ws until the slow start threshold ssthre is reached and the connection transitions
to the CA phase.
52 Chapter 4. Packet-Level Network Simulation

• Congestion Avoidance (CA): This phase updates Ws following algorithms that tend to
converge to share fairly the bandwidth of a link among multiple flows. TCP-Reno and TCP-
Tahoe use Additive Increase-Multiplicative Decrease (AIMD) [104] algorithm with linear
growth of Ws when there is no congestion and an exponential reduction when a conges-
tion occurs. TCP-BIC uses an additional binary search increase, and TCP-CUBIC uses a
cubic function. In TCP-Reno, upon receiving an ACK packet the window is increased as
Ws = Ws + W1s (Additive Increase). Upon sensing congestion the window is roughly halved
Ws = min(1, W2s ) (Multiplicative Decrease). The actual decrease value will depend on the
congestion signal: after an RTO the window is set to Ws = 1 and the connection transitions
to the EB phase with ssthre = W2s . In TCP versions with a FR phase, the connection
transitions to that phase upon receiving a Triple Duplicate ACK (3DACK).

• Exponential Backoff (EB): During this phase each time an RTO is signaled, a single packet
is retransmitted and the RTO is duplicated. Upon receiving a valid ACK the connection
transitions to the SS phase.

• Fast Retransmit (FR): Upon receiving a 3DACK in the CA phase some TCP version infers
that the missing packet was dropped due to congestion and retransmits without waiting for
an RTO. In TCP-Tahoe, the slow start threshold is set to ssthre = W2s , the window to
Ws = 1 and transitions to SS phase. TCP-Reno sets ssthre = Ws = W2s and transitions
to FR/FR phase. TCP-NewReno, sets ssthre = W2s , Ws = ssthre + 3, and transitions to
FR/FR phase.

• Fast Recovery (FR/FR): TCP-Reno retransmits the missing packet, waits for the entire
window to be ACKed and then transitions to CA (otherwise an RTO occurs and transitions
to SS). TCP-NewReno does the same, but additionally sends a new packet for every DUP-
ACK received.

Different TCP versions update the congestion window differently but their dynamics are highly
dominated by the Round Trip Time (RTT). The Round Trip Time (RTT) is defined as the time
between a packet is sent and the time the ACK for that packet arrives back to the sender. Most
TCP implementations calculate an estimated RTT, and also reflects the time it takes TCP to
detect congestion.

4.2.1.2 Traffic Shaping


TCP provides good traffic control to avoid network saturation but there is a limit on how much
control can be accomplished only from the edges of the network. Therefore, several mechanisms
are implemented in the routers to complement TCP congestion control mechanisms. Two main
classes can be distinguished: 1) Active Queue Management (AQM) and 2) Scheduling.
AQM schemes signal potential congestion, before buffers actually get saturated, and achieve
a smoother buffer size control that reduces jitter. This is a key factor that made possible an
exponentially rapid, yet decentralized growth of the Internet. Many AQM strategies exist in
the literature [106], and those designed to cooperate with TCP are called TCP/AQM. The most
implemented TCP/AQM algorithm in routers is Random Early Detection (RED) [107] which
4.2. Preliminaries and Related Work 53

Figure 4.1: State machine of TCP transmission phases [105].


54 Chapter 4. Packet-Level Network Simulation

Figure 4.2: Evolution of TCP Reno Congestion Window.

discards packets according to a probabilistic function based on the buffer size. In most cases RED
responds to a time-averaged queue length qb(t) rather on the instantaneous q(t), making it less
sensitive to packet bursts. If the queue has been mostly empty, RED will not tend to drop packets
even when the buffer is almost full, and tail-drops can still occur.

RED has two main stages: 1) estimation of the average queue size qb(t) and 2) decision of whether
or not to drop an incoming packet [108]. Queue size estimation uses an exponentially weighted
moving average (EWMA) which acts as a low-pass filter: qb(tk ) = (1 − α)b q (tk−1 ) + αq(tk ), where α
is the time constant of the filter. To drop packets, RED uses a drop probability function p(b q (t))
as defined by Equation 4.1, where the queue estimate qb(t), maximum and minimum thresholds
(tmax , tmin ) define the assigned probabilities (pmin , pmax ). The p(b
q (t)) is shown in Figure 4.3.
Note that it is very counter intuitive that packets that could be perfectly buffered are actually
dropped to prevent congestion.

Another common AQM mechanism found in routers is Explicit Network Notification (ECN)
[RFC 3168]. On top of using missing packets to implicitly signal congestion, special packet header
fields are used to explicitly mark congestion on the path. At the routers similar mechanisms as
RED are used, but instead of randomly dropping packets, these packets are set with a Congestion
Experience (CE) field which are read by end nodes that react accordingly. This technique requires
that both ends of a TCP connection, and all routers and switches along the path, support ECN
features.
4.2. Preliminaries and Related Work 55



 0 0 ≤ qb(t) ≤ tmin



qb(t)−tmin
p(b
q (t)) = p
tmax −tmin max
tmin ≤ qb(t) ≤ tmin (4.1)




tmax ≤ qb(t)

1

Figure 4.3: RED drop probability function as defined by Equation 4.1.

Meanwhile, scheduling algorithms (at routers) determine which packets, among those already
buffered, are to be sent immediately next. Scheduling is primarily used to manage the allocation
of bandwidth among competing flows and provide Quality of Service (Qos) guarantees to specific
traffic. The router must classify each incoming packet, which is done differently by hardware
manufacturers and ranges from simple rules (e.g. matching IP addresses and ports) to more
complex connection-aware algorithms. Routers implement scheduling policies to decide the service
order for each traffic class. Logically separated queues are used for each class which are served
based on priorities. A simple, commonly found scheduling algorithm is the Weighted Round Robin
(WRR) where queues are visited in a round-robin fashion proportionally to weights assigned (each
class will get a bandwidth share proportional to their priorities).

4.2.2 Data Network Simulation Tools


There are several network simulators available both for commercial and academic use.
Some of the most common network simulators mentioned in the literature are: ns-2 [18], ns-3
[19], OMNET++ [20], OPNET [21], JiST [22], just to name a few. MiniNET [109] is widely used
for network emulation. Network simulators vary in several aspects: the discrete-event techniques
and principles (sequential or parallel, replication- or decomposition-based, CPU- or GPU-based)
[23], the library of reusable models, the software interfaces to assist the modeling activity (e.g. to
define a network topology), etc. In some simulation packages network model behaviour and model
topology are defined intermingled in the code. While this allows for great flexibility, the code
56 Chapter 4. Packet-Level Network Simulation

can soon become too complex to understand, debug and maintain. A number of simulation tools
(e.g. OPNET, OMNET++, etc) provide graphical editors which allow for an easy and compact
understanding of the network topology, separating topology from model behaviour.
The dominant simulation technique used by majority of network simulation tools is discrete-
event simulation [3]. Discrete-event simulation is used for all layers of computer networks: signal
processing in the physical layer, medium access in the link layer, routing in the network layer,
protocols in the transport layer, and finally modeling application data flow.
There is a plethora of surveys comparing available network simulators, focusing on diverse
aspects.
Regarding performance, the fairness of comparisons can get very tricky. The performance of any
non-trivial network model in a given simulator depends on many, difficult to compare properties
of specific tools. For example, a trade-off between performance and available metrics should be
considered together with the metrics relevant for each particular study. The number of simulated
network layers and features in each layer has also a big trade-off with performance, and simulating
more features is not necessarily better in all case studies. Also, a simulator could take longer to
simulate the first run of a model, but learn about its structure and simulate subsequent runs faster
(e.g. in a parameter sweeping study). Then, bunches of runs should be compared. These types
of studies require very thorough considerations that are beyond the scope of our interest, because
our focus is on performance scalability when a system’s size grows.
Table 4.2 from [110] provides a non-comprehensive list of network simulator surveys and com-
parisons found in the literature.
4.2. Preliminaries and Related Work 57

Paper Type Studied Simulators Focus


[111] comparison Opnet, ns-2, OMNeT++, accuracy, usability
SSFNet, QualNet, J-Sim, Totem
[112] comparison ns-3, OMNet++, JiST, ns-2 performance
[113] description ns-2, GloMoSim, Opnet, Sensor- overview
Sim, J-Sim, Sense, OMNeT++,
Sidh, Sens, TOSSIM, ATEMU,
Avrora, EmStar
[114] comparison Opnet,ns-2 setup, accuracy
[115] comparison ns-2, cnet, JNS, Opnet, Advent- features (limited)
Net, NCTUns
[116] case study J-Sim, ns-2, SSFNet scalability, performance, memory
[117] comparison Opnet, ns-2, QualNet, OM- suitability for critical infrastruc-
NeT++, JSim, SSFNet tures
[118] comparison ns-2, TOSSIM architecture, components, mod-
els, visualization
[119] description GloMoSim, ns-2, DIANEmu, overview
GTNetS, J-Sim, Jane, NAB,
PDNS, OMNeT++, Opnet,
QualNet, SWANS
[120] comparison SSF, SWANS, J-Sim, NCTUns, models, visualization
ns-2, OMNeT++, Ptolemy,
ATEMU, Em-Star, SNAP,
TOSSIM
[121] description OMNeT++, REAL, ns-2, overview
C++Sim, cnet, SSFNet, CLASS,
SMURPH
[122] case study Opnet, ns-2, testbed accuracy
[123] case study Opnet, ns-2, GloMoSim accuracy
[124] comparison ns-2, OMNet++, ns-3, SimPy performance
and JiST/SWANS
[125] comparison ns-2, SSFNet, JavaSim scalability
[126] comparison ns-2, SimPy, OMNeT++ performance
[127] comparison OPNET, NS, OMNeT++ performance
[128] overview ns-2, ns-3, OPNET, NetSim, features
OMNeT++, REAL, J-Sim
,QualNet
Table 4.2: Overview of some network simulator comparisons (extended from [110])

4.2.3 Queuing Theory Results for Load-Balancing


Queuing theory is vast and provides important tools for network performance analysis [129]. The-
ory for networks of queues can provide powerful analytic results for computer networks [130].
58 Chapter 4. Packet-Level Network Simulation

In this chapter, we shall resort to a specific subclass of results in queuing theory related to
load balancing problems, as they are relevant for our motivating case study in the ATLAS TDAQ
network and the results obtained in Section 4.5.
Load balancing is an essential, often crucial mechanism to improve performance in call centers,
server farms, and various applications that operate on parallel servers. On top of defining efficient
balancing schemes (in terms of delay and blocking probability), applications with a large number
of servers typically require cheap schemes in terms of exchanged messages, required memory and
computing efforts at the dispatcher. There is often interest in characterizing load regimes (ratio
between the amount of incoming load and the service capacity) that can lead to efficient utilization
of resources, and finding rules of thumb to dimension systems with dynamic load balancing schemes.
Two main ideas are usually adopted to gain insights analytically: to assume large scale networks
and then obtain asymptotic results using propagation of chaos, on the one hand, or to restrict the
load balancing schemes to obtain computable bounds, on the other.
Since the 80s strong attention has been given to mean-field results for different classes of
networks with load balancing schemes. In particular, a great deal of research was devoted to prove
mean-field limits for schemes like join the shortest of d among n queues (also called Power of D)
where n is large, starting with the seminal work of [131] and [132], and quickly complemented
by several papers on the mean-field behavior of such systems. Transient functional law of large
numbers and propagation of chaos have been obtained for instance in [133] and [134] for First-In-
First-Out (FIFO) scheduling, and more recently, propagation of chaos properties and asymptotic
behavior of the number of occupied servers were obtained under very general assumptions [135].
All these results concern systems without blocking and sensitive to the job size distribution.
Meanwhile, researchers have considered schemes that lead to the insensitivity of the queuing
system to the job size distribution. This path was first taken in [136] (see also [137], [138] or [139])
For small networks with a single class of traffic, it was shown that the insensitive load balancing
(ILB) compares very accurately to optimal policies for a given job size distribution to estimate
blocking probabilities, while delay estimations are a less accurate [136], [140].
The penalization imposed by reversibility is greater for multi-class networks while the sensitivity
(of optimal sensitive policies) also deteriorates [137], [139].
Hence, a small-to-moderate price must be paid for robustness and simplicity. It is perhaps
counter intuitive that for models with infinite buffers, this price becomes very high. It was indeed
proven that if the state space is infinite, and assuming absence of blocking, the optimal insensitive
load balancing is static (i.e., it does not depend on the queue-length) and it is hence much less
efficient than a state-dependent sensitive load balancing [141].
Both research directions described above shed light on the possible performance of dynamic
load balancing, but also present strong limitations:

• The performance of the limiting system might not be informative. For instance the blocking
probability or the delay will be 0 for a large class of policies,

• The price to impose insensitivity to the job size distribution is largely unknown.

In [142], the intersection of both research directions was considered by studying the asymptotics
of large networks (i.e., a finite but arbitrarily large number of servers) for ILB schemes. It was
shown that a qualitative phase transition occurs at a critical load ρc (n) = 1 − an−θ/(θ+1) where θ is
the buffer depth, a is a constant value and n is the number of queues. The blocking probability is
4.2. Preliminaries and Related Work 59

θ
exponentially small until ρc (n), and then changes to order O(n− θ+1 ) at the critical load, and to a
higher order thereafter. This generalizes the Halfin-Whitt-Jagerman (H-W-J) regime established
for M/M/n/n systems. In [143], [144], critical regimes for optimal use of resources were identified
for single queues. In particular the blocking probability for M/M/n/n systems is O(n−1/2 ) only

if the number of servers scales as ρ + a ρ, where ρ is the load of the system. Before this critical
regime the blocking probability is exponentially small, while it is of constant order after that
critical point.
In [142] it was shown that a rule similar to the popular staffing rule established for the
M/M/n/n system is valid, but must depend critically on the value of θ when dynamic load
balancing is employed.
Whether this critical regime is a general phenomenon or not (i.e., is it verified for a large class
of dynamic load balancing schemes ?) remains an important and completely open question that
we will investigate via packet-level simulation in Section 4.5.2.

4.2.4 Applications and Data Flow in the DAQ network


The TDAQ system in the ATLAS experiment at CERN was briefly introduced in the previous
Chapter. The HLT and DAQ network are the motivating case study for the simulation models
in this Chapter, so more specific details on the data flow and distributed applications are given here.

Figure 4.4 shows the communication among various applications in the HLT during Run2.
Upon selection by L1, Event data is transferred to the ROS, and the specialized application HLT
supervisor (HLTSV) is notified. The HLTSV assigns Events to trigger processing unit (TPU)
servers, which run an application called a data collection manager (DCM) to centralize commu-
nication between the TPU and the rest of the system. DCMs interface with instances of the
application processing unit (PU)—one per available core, between 8 and 24 per host. Each Event
is assigned to a single PU instance that analyzes it and decides whether it should be permanently
stored or discarded. This system represents is the focus for the M&S process in this Chapter.
Applications communicate over an Ethernet network with link capacities of 1 and 10 Gbps.
Two core routers and approximately 100 switches interconnect roughly 2,000 multicore servers
using TCP/IP protocols. The farm is composed of 50 racks for TPU servers and 25 racks for ROS
nodes. Each TPU rack contains from 30 to 40 servers (DCMs and PU applications), and each ROS
rack contains 8 servers. Within each rack, servers are connected to a shared top-of-rack (ToR)
switch via 1 Gbps links. The HLTSV node and the ToRs are connected to the core switches over
10 Gbps links.

4.2.4.1 Other simulation studies of the TDAQ system

Several other studies were taken into consideration for the development of simulation models in
this Thesis.
In [93], an early design of the DAQ system was studied before its construction. ATM and
Ethernet measurements on prototypes were extrapolated with simulation models to evaluate the
feasibility on a larger scale system. Using the discrete-event module of the Ptolemy simulator,
network nodes use queuing models to represent NICs and CPUs. Simulations were used to study
60 Chapter 4. Packet-Level Network Simulation

Level1
Readout Drivers L1 Accept Trigger
(ROD) (L1T)

Readout Servers (~100)


L1
Result
Readout System
(ROS)

clear
fragments requests

Trigger Processing Servers (~2000)

Processing Data Collection assigment High Level


 Units  Manager Trigger Supervisor
(PU) (DCM) (HLTSV)
clear
accept/reject HLTSV server (1)

~10 Full Event


Permanent
Sub-Farm Output
Storage
(SFO)
(Tier0)

Figure 4.4: TDAQ data flow applications in Run2.

scalability, robustness, fault-tolerance, and what-if scenarios to predict the effect of increasing link
bandwidth and node processing power.
[94] resorts to a mathematical queuing model of the HLT system and sets upper limit require-
ments for the network delay, loss, and processing time. For data networks, authors propose a
component-wise quality attenuation approach to incorporate loss and delay into topological queu-
ing compositions. A correlation between traffic load and quality attenuation is established.
In [95], [96], a simulation of the ATLAS DAQ system is modeled in OMNET++. Communica-
tion patterns with particular focus on the TCP incast pathology are addressed. What-if scenarios
are performed to evaluate the behavior of the system using different traffic shaping and scheduling
policies, and with network hardware modifications.
In [97], [98], a large buffer area envisioned for the DAQ upgrade in 2024 is modeled in the
OMNET++ simulation framework. The model is verified and calibrated against real operational
data and then be evolved toward the future architecture. Simulations allow exploration of dif-
ferent strategies for resource provisioning and trade-offs between buffering space and processing
capabilities are studied.

4.2.4.2 The Future FELIX Network

For 2025, the ATLAS experiment is planning full deployment of the new Front-End LInk eXchange
(FELIX) system [146], shown in Figure 1, that aims at interfacing between detector electronics and
the TDAQ system. FELIX will replace the custom point-to-point connections with a Commercial-
Off-The-Shelf (COTS) network technology (e.g. Ethernet, Infiniband, Omnipath). FELIX servers
will act as routers between 24-48 detector serial links and 2-4 standard 40Gbps/100Gbps links.
4.2. Preliminaries and Related Work 61

Figure 4.5: FELIX system components (as of 2015) [145].

FELIX servers will communicate with a smaller set of commercial servers, known as Software
ReadOut Drivers (SW ROD), used for data collection and processing of physics data.
When a simulation model of the FELIX systems was to be implemented in 2015, different
components were envisioned to connect to the FELIX servers. For example, the Detector Con-
trol System (DCS) monitors and controls the detector front-end electronics while the Control &
Configuration system sets up and manages data acquisition applications. The various types of
traffic differ in their throughput, latency, priority and availability requirements. The DCS elec-
tronics requires the highest priority and low latency to react fast, but is expected to require low
throughput. Meanwhile, the detector’s data will use most of the network bandwidth so it can
have less priority to avoid saturation. Table 4.3 summarizes the different traffic types and their
requirements as designed in 2015. The communication patterns are also different for each type of
traffic. While DCS traffic follows a many-to-one pattern (all FELIX servers communicate with a
single DCS server), Control and Monitoring traffic require a many-to-few pattern. Detector data,
on the other hand, uses a simple one-to-one or two-to-one pattern from a FELIX server to SW
RODs.
Part of this effort consists of designing and implementing a network that can meet the demands
of the system (high-availability, high-throughput, low-latency, redundancy, etc.). Modeling and
simulation supports the design of the network and aids in the decision process (e.g. for selecting
technologies, topologies, node distributions, etc.).
62 Chapter 4. Packet-Level Network Simulation

Comm.
Traffic type Throughput Latency Priority
Pattern
Many
DCS Low Low High to
one
Many
Control and
Med. - High to
Config.
few
One
Detector Data High - Med. to
one
Many
Monitoring High - Low to
few

Table 4.3: FELIX traffic types and requirements.

4.3 Simulation model of the Network layer


According to the questions and goals that drive our M&S efforts, the level of abstraction defined
for the HLT model is at the packet-level. The most basic unit of abstraction is considered to be
an IP network packet and thus the model describes Layers 3 and above of the TCP/IP layered
paradigm. I.e., the model does not consider data-link or physical details such as bit encoding,
transmission errors, electromechanical features of the cable, digital/analog multiplexing, etc.

4.3.1 Modeling the TCP Congestion Control


TCP is widely used in the DAQ network for reliable data transmission and is a central model
for the simulation of the TDAQ system. This section specifies the TCP features included in our
DEVS-based models as well as validation exercises.
There are over a dozen official TCP versions which differ in their control mechanisms and (po-
tentially crucial) implementation details [RFC 7414]. Our simulation model provides a simplified,
yet rich set of TCP mechanisms exhibiting acceptable levels of accuracy for the DAQ problematic.
An example of a simplification adopted is the TCP connection initialization and release phases:
in the DAQ network the TCP connections are established and kept alive throughout very long
periods of operation (days, weeks). Thus, our implementation assumes that every TCP state-
machine remains always in the ’ESTABLISHED’ state.
The main TCP mechanisms included in the simulation model are: 1) Sliding Window Con-
gestion Control, 2) Additive Increase, Multiplicative Decrease (AIMD), 3) Congestion Avoidance
(CA), 4) Slow Start (SS), 4) Fast Retransmit (FR), 5) Triple Duplicate ACK (3DACK), 6) Fast
Recovery (FR/FR), as described in Section 4.2.
Details for the reusable TCP DEVS-based sender and receiver blocs as implemented for Pow-
erDEVS can be found in Appendix 7.3
4.3. Simulation model of the Network layer 63

4.3.2 Preliminary TCP Model Validation Via Comparison With a Net-


work Simulator
Gathering very detailed information on TCP internal dynamics from a real operating systems
can prove very challenging and usually only coarse grain metrics can be obtained (eg: averaged
RTT, total drops, etc). An approach that can be used to increase credibility on simulation models
is to cross-compare simulation tools for a same model, and draw conclusions. We tested and
compared TCP in PowerDEVS against an equivalent model in OMNET++, a well-established
open network simulator. As tools often focus on slightly different features of a same modeled
entity, some differences are expected, but key TCP features should be validated at both ends.
Broader system-level validations will be performed in Section 4.4.3 in the context of the real DAQ
network.
We designed a simple simulation scenario to focus on the basic TCP mechanisms included in
the simulation model. Both simulators were set up with the same topology as shown in Figure 4.6.
A client-server pair of hosts with infinite data to send are connected by a bottleneck link through
an intermediate router with a RED queue. Description of other topology models will be provided
in subsequent Sections. The parameters used are summarized in Table 4.4.

5Mbps 3.5Mbps

5Mbps 3.5Mbps ACK

TCP R ED TCPr

R ED

(a) PowerDEVS (b) OMNET++

Figure 4.6: PowerDEVS and OMNET++ topologies used to cross-compare TCP behaviour.

Figure 4.7 and Table 4.5 show comparisons of PowerDEVS and OMNET++ simulations for
detailed and long run metrics, respectively. Figure 4.7 compares the congestion window for 10
seconds of simulation. At the beginning, both simulations start in SS phase showing a quick
increase of the TCP congestion window. At around 0.2s, the SS threshold is reached and the
CA phase starts with a slower increase of the congestion window. The first packet drops occur
at approximately 0.5s as a consequence of the stochastic RED algorithm. Both models detect
a 3DACK and start FR/FR phases. It can be observed a different dynamic only for the first
FR/FR phase after receiving the first valid ACK. Both models deflate the window back to the SS
64 Chapter 4. Packet-Level Network Simulation

PowerDEVS OMNET++
Parameter Name Value Parameter Name Value
packetSize 4376 bits tcpseg 547 bytes
startTime 0 seg tOpen 0 seg
bandwidth (host) 5 Mbps datarate (host) 5 Mbps
bandwidth (router) 3.5 Mbps datarate (router) 3.5 Mbps
propagationDelay 0.0565 us delay 0.0565 us
TCP
MSS 500 bytes MSS 500 bytes
WND_SSTHRESH 131 pkts ssthresh 65535 bits
T_RTT 3 seg RTTVAR 3 seg
RTT_alpha 0.125 g 0.125
DUP_ACK_LIMIT 3 pkts DUPTHRESH 3 pkts
delayedAcksEnabled False
nagleEnabled False
sackSupport False
tcpAlgorithmClass TCPReno
RED
tmin 437.6 Kb minth 50 pkts
pmin 0 – –
tmax 218.8 Kb maxth 100 pkts
pmax 0.1 maxp 0.1
alpha 0.001 wp 0.001

Table 4.4: PowerDEVS and OMNET++ configuration parameters to compare TCP behaviour

threshold, but OMNET++ detects an RTO while PowerDEVS detect several consecutive 3DACK.
This discrepancy is being investigated. Later both models restart with CA phases. Although with
some differences, both simulators produce qualitatively similar traces for the TCP behaviour under
this scenario. A broader range of tests were reported in [147].
Table 4.5 shows the total data sent, total drops and mean round-trip time captured after 20
seconds of warm-up in a simulation of 400 seconds. Both models yield comparable metrics in terms
of orders of magnitude, that we consider satisfactory for the purposes at hand. More refinements
can be implemented using this cross-tool comparison techniques.

Measurement OMNET++ PowerDEVS


Total Data Sent (MB) 158.76 159.42
Total Drops (MB) 0.13 (0.08 %) 0.58 (0.36 %)
Mean RTT (ms ± std) 54.63 ± 22.32 62.47 ±25.55

Table 4.5: Comparison of some relevant TCP accumulated metrics in OMNET++ and PowerDEVS
for a run of 400 seconds (the first 20 seconds are not measured, for warm-up purposes).
4.3. Simulation model of the Network layer 65

CA FR/FR

FR/FR FR/FR

3DACK

SS CA
CA

valid ACK

valid ACK
RTO

Figure 4.7: Qualitative comparison of TCP congestion window evolution in OMNET++ and
PowerDEVS.

4.3.3 Modeling Network Topologies


Sticking to the bottom-up modeling approach different low-level building blocks are composed to
form more complex components. This section briefly describes how the low-level components (such
as queues, generators, delays) are composed to represent high-level components (such as hosts and
routers), and in turn how larger topologies can be built. As a case study, an early design stage of
the Felix Network topology is shown.

4.3.3.1 Packet Data Structures and Low-Level Network Models


Low-level models receive, manipulate and output a data structure shown in Figure 4.8 that rep-
resents a network packet. A Packet is identified by an unique ID and records their creation
timestamp. It can carry any application-specific data in its payload, using the IProtocol hierarchy
to represent different network protocols. Each implementation of the IProtocol interface specifies
the data necessary for each protocol. Some models access only packet data (e.g. a queue model
uses the packet size), while some models require protocol specific data (e.g. a TCP sender model).
For the TDAQ model, abstractions and simplifications are made in the representation of packets.
Packet format does not influence TDAQ behaviour, so only the required fields are included (e.g.
checksum fields are omitted). This simplifies the modeling task, helps focusing on relevant be-
haviour, and aids in achieving better simulation performance.

The PowerDEVS network model library was extended to include more than 40 DEVS atomic
models to represent different functions. A detailed description of the low-level model library is
described in Appendix 7.3. Figure 4.9a shows some of the most relevant ones, which are outlined
66 Chapter 4. Packet-Level Network Simulation

NetworkPacket

// basic fields // protocol manipulation


getId():uint getProtocol(protocolId):IProtocol
getBirthTime():double removeProtocol(protocolId)
getLength_bits():double addProtocol(IProtocol)
getLength_bytes():double getAllProtocols():vector<IProtocol>

// Application Payload
getPayload():void*
getPayloadSize():void*

IProtocol

getId():uint
getOSILayer():uint
getSize_bits():double
clone():IProtocol

TCPIPProtocol IPProtocol QoSPRotocol

getPortSrc():uint
getPortDst():uint
getACK:uint
getIpSrc():uint
getIpDst():uint ... getTypeOfService():uint

...
  ...
getSEQ:int
...

Figure 4.8: NetworkPacket and IProtocol class diagram

next.

• Bandwidth Delay and Propagation Delay: Receive packets and output them after a
delay. The propagation delay model uses a fixed configured delay. The bandwidth delay
model determines the delay based on the packet size and a configured bandwidth. These are
used in conjunction with other models to represent links.
• Tail-drop Queue: Implements a FIFO queue with limited buffering capacity. When the
capacity is exceeded new incoming packets are discarded (tail-drop). Packets are dequeued
when requested from external models which allows reusing the queue for different buffering
mechanisms (e.g. at links, routers, applications, etc).
• Random Early Discard (RED) and Explicit Congestion Notification (ECN): Im-
plement AQM mechanisms based on the incoming packets, tracking a weighted average queue
size. Packets are randomly discarded (or set the ECN flag) based on the calculated discard
probability as described in 4.2. This is used in conjunction with the tail-drop queue to
model RED-enabled routers. There are also priority queue models which use Weighted-
Round-Robin (WRR) to implement Quality of Service (QoS) features.
• TCP sender/receiver protocol: Models the TCP protocol at sender and receiver ends as
described in Section 4.3.1
• Routing Table: Demultiplexes packets incoming at a router based on a routing table. In the
TDAQ system routing protocols are omitted, so in that case the routing table model static
4.3. Simulation model of the Network layer 67

RED

R ED
Router REDRouter

ACK

1
TCP TCPr
TCPSender_Single TCPReceiver_Single

ACK
TCP TCPr
TCPSender TCPReceiver
egressPort

RED UDP UDPr


portRed UDPSender UDPReceiver
(a) Relevant low-level Models (b) high-level Models

Figure 4.9: PowerDEVS Packet-Level Model Libraries

information defined during initialization. Some methods implemented to populate routing


tables are e.g. per source/destination IP and Port, and per flow identifiers.

• Flow Generator: Generates streams of packets according to parameters including proba-


bilistic distributions for the packet size and intergeneration times. Parameters can include
start/stop times, type of service (for QoS), and a route to be taken across the network. This
model is used to represent abstract generic applications sending data through the network.
Though, TDAQ specific applications are modeled differently and are explained in Section 4.4
68 Chapter 4. Packet-Level Network Simulation

• Egress Port: Represent the Network Interface Controller (NIC) cards and cables. These are
intermediate-level models as they are composed of queues, bandwidth delay and propagation
delay atomic models. Depending on the kind of egress port, the models might also include
RED, ECN, priority queues, etc.

• TcpSession: Intermediate-level (coupled DEVS) models that represent a TCP session within
a host. It includes a flow generator (to model an application), a queue (to model TCP
buffer), and a TCP sender (to model TCP sliding-window logic), and TCPPacketization (to
split messages up into IP packets).

4.3.3.2 Topologies and High-Level Network Models

High-level components are DEVS coupled models that represent top level nodes in a network
topology, such as routers or hosts. Figure 4.9b shows the new packet-level library included in
PowerDEVS. These models are composed of the low-level atomic models described before.
High-level components can be built in different ways to represent the varied hardware and
software present in real networks. Figure 4.10 shows a topology built up by composing some of the
models in the PowerDEVS high-level network library. The high-level library includes the following
basic components (which can, in turn, assume very different behavior depending on the initial
configuration of its constituting low-level models described above):

• Router/Switch: These models use a RoutingTable atomic model to forward packets to


adequate EgressPort models. Different routers are provided in the library to enable tail-
drop, RED, ECN, or QoS (priority queues). Routers implement cut-through output queues
(Input queues and store-and-forward routers are not yet included).

• Host: Models the edge nodes that send and receive data using either TCP or UDP. Multi-
ple TCP sessions can be easily configured and applications are represented by flexible flow
generators.

These basic models often suite well for early stages of network design, when exact details are
still unknown (e.g. an exact application logic). An example of this situation is shown in section
4.3.3.3. Depending on the type of network modeled, high-level coupled models can be incremen-
tally updated with more detailed application behaviour or specific queuing mechanisms at routers.
An example for this is the TDAQ model discussed in Section 4.4.

When modeling topologies high-level models can be interconnected via DEVS ports and links
mimicking real network ports are links. This one-to-one mapping between models and real system
entities allows network experts unfamiliar with M&S tools to become productive quickly, as the
intricacies of the DEVS formalism are hidden away (although not restricted) from the modeling
activity. Our models can be created and connected through the visual user interface providing a
graphical topological view natural to network experts. Figure 4.10 (center) shows a simple topol-
ogy with 6 hosts and 3 Router/Switches defined in the PowerDEVS graphical interface.
4.3. Simulation model of the Network layer 69

TCP

<- ACK

RED
TCPSender1 TCPReceiver1
Z

R ED
RED
Router1 Router2 Router3

TCPSender2 TCPReceiver2 RED

RED

TCPSender3 TCPReceiver3
<- ACK

Figure 4.10: Simple packet-level topology using PowerDEVS network library. Top elements from
the high-level library are composed of more basic models from the low-level library.

Yet, this convenient approach can hit a limit on practicality1 when a topology becomes large.
This problem was tackled before with Vectorial DEVS [71], which we use extensively in our models,
though the approach is not suitable for defining network topologies. We developed two ancillary
code-based tools to tackle this problem at different levels: TopoGen [15] and Py2PDEVS [17].
TopoGen is a Ruby-based tool that can compose automatically large topologies based on struc-
tured descriptions of a system. Of particular interest in this Thesis, the topology can be retrieved
from (possibly very large) real world networks compliant with the increasingly adopted Software
Defined Network paradigm. TopoGen then generates a simulation model by accessing a description
of the network stored at the SDN Controller node. This strategy permits the model to keep up
with frequent topology changes in real systems. In the next section we describe an application of
TopoGen to build a medium-sized model in the context of TDAQ.
Meanwhile, Py2PDEVS is a Python-based tool that offers a programmatic interface to func-
tions of the PowerDEVS simulator and DEVS atomic models through a well defined Python-C++
interface. Then, by coding in a scripting-like manner using standard Python constructs, DEVS
models in the library can be retrieved, instantiated, parameterized and interconnected, composing
arbitrarily large/complex topologies.
Both TopoGen and Py2PDEVS are general purpose, open and extensible tools for meta-
modeling, focused on the automated composition of topologies, and are not restricted to modeling
network systems. These tools are described with more detail in Chapter 7.

4.3.3.3 Case Study: Creating Larger Topologies


This case study shows the M&S of a medium sized topology based on a real SDN-enabled network.
To focus on our new topology generation strategy and packet-level model libraries, we make the
1
Quoting Prof. Barry Nelson at the 2017 Winter Simulation Conference’s Keynote, "[don’t] drag and drop until
you drag and drop".
70 Chapter 4. Packet-Level Network Simulation

case study simple in terms of the real application details. More complex examples are discussed
in Section 4.4
The future FELIX network (see Section 4.2.4.2 above) will provide connectivity between dif-
ferent components of the FELIX system (see Figure 4.5) and will handle various types of traffic
which differ in their throughput, latency, priority and availability requirements (see Table 4.3).
To increase confidence about the coexistence of these traffic types while meeting TDAQ perfor-
mance requirements a modeling and simulation approach was used to study expected throughput
and latency, and anticipate possible bottlenecks.
Although high level requirements are well defined by the FELIX team, each subsystem’s spec-
ification is updated often during the design process. Specific, measurable metrics (throughput,
processing times, etc.) will not be known until the real final system is in place. We then provide
guidelines based on M&S for realistic ranges of candidate parameter values (parameter sweeping).
The FELIX network topology contains more than 100 nodes. The network team uses a Mininet
emulated environment that is continuously evolving to test connectivity options in the topology.
We applied TopoGen to eliminate the task of manually defining the topology making it faster and
less-error prone. Also, the simulation model is automatically created directly from the topology
used by the networking team, helping the simulation team to keep the model updated with the
real network.
The TopoGen tool is conceived around a workflow of model transformations. The one used in
this case study is depicted in Figure 4.11.

Figure 4.11: Semi-automated topology modeling workflow with TopoGen for network simulation.

It consists of three phases: automatic topology retrieval, augmentation with dataflow patterns,
and serialization into a simulation model. These are commented below:

• Phase 1 - Automatic topology retrieval: TopoGen retrieves the topology from an ONOS
SDN controller installed within the emulated environment. The topology is serialized to Ruby
code classes (named NTM in Figure 4.11). The fact that the original topology was specified
in an emulated environment or a real network is transparent for TopoGen.

• Phase 2 - Topology Augmentation: Additional nodes are added manually to the topology
to also capture some nodes of the HLT network which were not included in the emulated
environment. Data flows for traffic generated by different servers (along their respective
parameters) are added programmatically. For this case study, only the Detector Data traffic
type and the Monitoring traffic type were considered.
4.3. Simulation model of the Network layer 71

• Phase 3 - Model generation: the augmented topology is serialized into a PowerDEVS


model. Router models use tail-drop Queue models, FELIX and HLT nodes use the TCP
Host model from the high-level library. Different flow generator configurations are used to
represent simplified versions of the FELIX applications.
For details on the connection with SDN controllers, serialization techniques, NTM intermediate
Ruby classes, and implementation details of TopoGen see [15].

Figure 4.12 shows the final modeled network. It includes FELIX network nodes (automatically
retrieved from the SDN controller), the HLT network nodes, the Detector Data and Monitoring
traffic flows (added programmatically). This provides an example of a real medium sized topology
being modeled using the low-level and high-level libraries described earlier in this Section.

Figure 4.12: Topology of the FELIX system.

We are now ready to investigate via simulation the impact of Monitoring traffic on Data traffic
when they are merged. One question to be answered is how average latency changes when choosing
between two candidate link capacities. Figure 4.13a shows the average packet latency using 1Gbps
links for Monitoring flows with increasing throughput in all servers. As Monitoring traffic grows,
72 Chapter 4. Packet-Level Network Simulation

(a) Link capacity allocated to monitoring traffic (b) Link capacity allocated to monitoring traffic
1 Gbps links 10 Gbps links

Figure 4.13: Simulated mean packet latency seen by the Traffic Monitoring servers. Blue area:
standard deviation. Red lines: min-max range.

the latency slightly increases until a transition is observed at a point when each server injects
650Mbps of Monitoring data. Thereafter the latency increases rapidly denoting the presence of
congestion. The buffer sizes and link utilization at the switches (not detailed in this report) indicate
that the sources of congestion are the 1Gbps links at monitoring servers. Figure 4.13b shows the
same experiment but replacing the 1Gbps link of Monitoring servers with 10Gbps links. The figure
shows how the saturation point moves up to 6500Mbps of Monitoring traffic. The congestion point
in the topology remains at the links directly connecting the monitoring servers.

4.4 Simulation model of the TDAQ Network Data Flow


Following our M&S methodology and bottom-up approach, once general purpose TCP and net-
working stacks are in place, we move into specific purpose application level models for the TDAQ
system.
The build cycles begins with observation of the real system (experimentation and metrics
acquisition). This consisted in code and documentation review as well as experimentation on the
real system which was accessible during short periods (Technical Runs during Long Shutdown
LS1).

4.4.1 Relevant Applications in the TDAQ System (HLT Applications)


The HLT subsystem is composed of different distributed applications that act in a coordinated
way to transport and filter Event data. The Data Flow is a set of applications, libraries and
communication protocols transporting the Event data from the ROS to the PUs in the HLT farm.
Figure 4.4 shows how Event data flows across the different applications and Figure 4.14 shows how
these applications are hosted in the network architecture.
The applications that contribute to the TDAQ Data Flow are described in detail in [1], [89],
[148], [149]. Concepts relevant for our M&S tasks are quickly outlined below, which correspond to
elements and messages shown in the application sequence diagram in Figure 4.15
4.4. Simulation model of the TDAQ Network Data Flow 73

Figure 4.14: Topology and applications in the HLT TDAQ farm for Run2.

• Processing Units (PU): Single threaded processes (usually one per server core) in charge
of analyzing Event data and making decisions on whether to accept or reject each given
Event.
- The analysis includes complex triggering algorithms which depend on several physics-
related configurations out the scope of this Thesis.
- Event portions called Fragments are requested in multiple stages and very seldom PUs
need the Full Event information to make a final decision.

• ReadOut System (ROS): Set of ∼100 servers and their applications that temporarily
store Event data that was accepted by the preceding L1 Trigger system.
- Data related to a single Event is spread across all ROS servers in the form of smaller
structures: Fragments.
- Each sub-detector is directly connected to a fixed set of ReadOut Drivers (ROD) and
Frontend Electronics (FE) as shown in Figure 3.5.
- ROS applications respond to Fragment requests coming from PUs which are stored in
local buffers until a clear request is received.

• High-Level Trigger Supervisor (HLTSV): Single server multithreaded application that


coordinates the interaction between the L1 Trigger, DCMs and ROSes.
- HLTSV receives notifications from DCMs about completed (accepted/rejected) Events
and sends clear requests to the ROSes to free up buffer space.
- HLTSV receives accepted Events notifications from the L1 trigger and designates a PU
that will process each next Event using a FIFO discipline to queue up PUs that become free.
74 Chapter 4. Packet-Level Network Simulation

• Data Collection Manager(DCM): Applications (one per processing server) in charge


of managing the communication between each PU in a server and the rest of the HLT
applications (ROS, HLTSV, SFO, etc).

- From the network point of view, the DCM plays a key role in controlling the
traffic with the ROS servers. The DCM adds a custom credit-based (token-bucket type)
algorithm, called DCM Traffic Shaping.

Figure 4.15: Sequence diagram of TDAQ applications involved in filtering a single Event. The
processing units (PUs) request information from the read-out system (ROS) in two stages: level-
two (L2) filtering and Event building (EB).
4.4. Simulation model of the TDAQ Network Data Flow 75

- Traffic Shaping acts on top of TCP, to request Event Fragments progressively in an


attempt to avoid congestion before TCP needs to come into play.

• Sub-Farm-Output (SFO): Set of servers and related applications that store Events ac-
cepted by the HLT until they can be sent out to the CERN permanent storage for ulterior
offline analysis.

4.4.2 The DEVS-Based Model for the TDAQ Data Flow


The TDAQ simulation model was created following the DEVS formalism with a hierarchical and
modular approach. The model represents an abstraction of the HLT system by focusing on the
traffic between the TPU and ROS servers.
Figure 4.16 shows an overview of the model as implemented in PowerDEVS. The full set of
developed models is described in detail in Appendix 7.3.

ROS

TPU

PU

Figure 4.16: DEVS TDAQ simulation model implemented in PowerDEVS


76 Chapter 4. Packet-Level Network Simulation

At the top level, DEVS coupled models represent the main HLT components described above
connected according to the topology in Figure 4.14, preserving real system’s semantics and struc-
ture. The latter is very important for communication purposes of the model with the network
team in a real world project.
Within TPUServer and ROSServer models, Vectorial DEVS (denoted with a green border) is
used to create multiple (equivalent) model instances that represent the ∼ 2000 processing nodes
and the ∼ 100 ROS nodes respectively. The ROS and DCM coupled models rely on the TCP
sender and TCP receiver blocks described before.
The sequence diagram in Figure 4.15 depicts the interaction between models that take part in
Event filtering. The PUs request information from the ROS (through the DCM) in two stages:
L2 filtering and Event building (EB). In L2, a small portion of the Event is first requested and
analyzed; this step can be repeated several times until EB takes place and all pending information
is requested at once.
TPU servers model the communication with the ROS and HLTSV, and are composed of DCM
and PU models. The DCM model plays a critical role in network traffic.
Thus, key control logic was extracted from C++ algorithms at the real DCM application
and mapped directly into a DEVS model (e.g. the credit-based algorithms), therefore increasing
substantially the homomorphism with the real system under study. The same strategy was applied
for the HLTSV assignment algorithm, by extracting C++ code directly from the real HLTSV
application.
The mapping between PU requests and the specific ROS node which contains the Fragments
is also implemented in the DCM, and is configured to follow realistic request patterns measured
from online metrics.
The PU model represents processing times and Fragment requests in the multiple iterations of
L2 and a final EB stage. Examples of some probabilistic simulation parameters of the PUs are
Acceptance/Rejection probability in each stage, order of fragments requested at each stage, mean
number of L2 steps, and processing time distribution in each step.
The simulation model assumes the following simplifications, where the possible impact on the
TPU ↔ ROS communication is explained:
• Network: Focuses on the network traffic between the TPU and ROS servers and assumes
ideal communication (no network) between other components.
• L1 Trigger: L1 Trigger filter is not modeled. The HLTSV is assumed to receive L1 notifi-
cations described by a configurable stochastic distribution.
• ROS: Interaction with the L1 Trigger and storage of fragments (subdetector → FE → ROD
path) is not modeled. ROS servers are always able to respond to fragment requests. Fragment
sizes follow a configurable stochastic distribution taken from real system measurements as
will be described next
• SFOs and Permanent Storage: The simulation model uses sinks to represents the Storage.
SFOs applications and Permanent Storage (Tier 0) are not modeled.
• PUs: Execution of triggering chain algorithms is abstracted and only processing time,
EF/EB stages, and fragment request patterns are modeled. These are configurable stochastic
distributions taken from real system measurements as will be described next.
4.4. Simulation model of the TDAQ Network Data Flow 77

• Other network traffic: Only traffic related to Events data is considered. Interaction with
other traffic (Monitoring, DCS, Configuration and Control, etc) was studied in [15] but not
described here.
Model parameterization is done considering data from the real system. Parameters for network
models are usually taken from the switch/router model specifications (e.g. switch/router buffers,
link speed, etc). Some model parameters are also parameters of the real HLT applications (stored
in the TDAQ OKS [150] distributed database) and might require basic transformations (e.g. DCM
credits, PUs per server, etc). Other model parameters are taken from the online operation metrics
(stored in the TDAQ PBeast [151] scalable archiving system) and require more advanced analysis
and transformations (e.g. ROS request rate distributions, PU execution times, etc).
We developed a thorough approach to tackle simulation parameterization challenges, tools and
methods for the TDAQ network in [152], not described in this report.

4.4.3 Model Validation and Application: Reproducing and Studying the


TDAQ System via Simulation
Two experiments are performed to compare simulated results against real system metrics. These
exercises are guided by two main practical concerns and technical challenges faced by the network
team at TDAQ:
• TCP Incast [153] pathology in ROS → DCM communication: When DCMs request
fragments to the ROS servers, all involved ROS nodes send their replies to the same DCM
almost simultaneously, creating traffic bursts in the direction ROS → DCM that increase the
filtering latency because of the queuing effect generated at the core and rack switches.
- TDAQ has high bandwidth and low latency in relation to TCP minimum retransmission
time (200 ms). Together with the data flow described earlier, these conditions often create
a throughput collapse known as the TCP Incast pathology [153]. The impact on TDAQ
can be huge. Whenever a single TCP packet is discarded at any switch, a PU can’t start
processing the Event until that packet is retransmitted (after 200ms at best), raising the
perceived network latency of an Event request from a theoretical minimum of 19.2 ms (for
2,400 bytes) to more than 200 ms.
- To avoid the Incast effect, the DCM application restricts the number of simultaneous
requests sent to ROS nodes using the credit based traffic shaping control (introduced before)
that limits the "in flight" number of requests on the network [154]. As responses can vary
significantly in size, traffic shaping doesn’t completely prevent packet losses, so it is important
to study the effects of queue saturation (and TCP retransmissions) and engineer the network
and its algorithms to maximize performance and minimize high-latency risks.
- Simulation goal: Characterize the impact of the DCM Traffic Shaping control algo-
rithms on the Event filtering latency.
• TDAQ Network upgrade: During LS1 the network architecture was updated considerably
[89].
- Switches connecting ROS servers and core routers were removed. The ∼200 ROS
nodes were replaced by 100 more powerful computers with four 10Gbps interfaces, each
78 Chapter 4. Packet-Level Network Simulation

directly connected to both core switches. The switches connecting TPUs and core routers
were expanded with additional 10Gbps links to both core switches. The overall throughput
supported at the network level increased by one order of magnitude. The resulting network
architecture is depicted in Figure 4.14.
- Simulation goal: Characterize the Event filtering latency for increasing L1 rates in
the new network architecture.
Event filtering latency was selected as the main performance metric to observe as it is of
utmost importance for the TDAQ system. It represents the time elapsed since the HLTSV assigns
an Event to a given PU until when the Event is either discarded or permanently stored.

4.4.3.1 Traffic-shaping behaviour (application-level)


Following our M&S methodology in Chapter 3, the system experimental frame EFS is first defined
as a subset of the complete system: the HLTSV, all ROS nodes, and a single instance of the DCM
and PU applications. To simplify timing calculations zero processing time is assumed at the PUs,
and Events with fixed size (2.4 Mbytes). This EFS is representative of the entire system with
unlimited resources, as each PU independently processes a single Event at a time. Scaling up this
scenario shows emergent behaviors of resource sharing (DCM credits, network bandwidth, etc.)
For this experimental frame, measurements sweeping a real system parameter (θS in the
methodology) were conducted. Figure 4.17a shows the averaged Event filter latency for the initial
DCM credits parameter ranging from 50 to 1500 credits.

(a) Real system measurements. (b) Simulation results.

Figure 4.17: Filtering latency versus initial DCM credits. The red curve shows average latency,
and blue dots show individual latencies; larger dot clusters denote higher number of occurrences.

There is an optimum configuration in which average latency stabilizes at 20ms (close to the
theoretical minimum) within a range of about 100 to 600 DCM credits. With fewer credits (12
4.4. Simulation model of the TDAQ Network Data Flow 79

to 100), latency increases (DCM can send fewer simultaneous requests, under utilizing network
capacity). Using more than 600 credits, latency increases rapidly and stabilizes at around 500ms.
Packet discards were observed on the ToR switches when more than 600 credits were used, thus
confirming that the latency increase is due to network congestion and TCP retransmissions (no
packet loss was observed at core switches).
The simulation was configured to follow the real system setup described earlier (controlled
θS → θM → θC translation), sweeping the number of initial DCM credits. Figure 4.17b shows the
results. The simulation reproduces the individual filtering latencies (blue dots) following the same
clustered patterns which gather around discrete ranges (close to 15 ms, 200 ms, 400 ms, and 600
ms). This validates the emergent behaviour of the DCM credits affecting filtering latencies as a
consequence of TCP dynamics (retransmissions and TCP Incast effect).
The simulated average latency approximates real measured latencies (λS ∼ λC ), with 100 to
600 credits attaining minimum latency and fewer than 100 credits slightly increasing latency. For
credits above 600, the simulation showed congestion and packet drops on the ToR switches, but the
increase in the average latency was much steeper compared to the real system. Another difference
was the stabilization point under congestion: the real system latency stabilizes at 500 ms, whereas
the simulated latency grows up to 700 ms. Although these differences shall require further studies,
the simulation reproduces very closely the intervals of major interest for the question at hand,
underlining the well-known trade off between degrees of model details, simulation accuracy, and
delivery times for a given engineering concern.
Latency clusters correspond to emergent behaviour related to exponential backoff in TCP
retransmissions. An early version of TCP model was not able to explain the cluster around 600ms
which is quite frequent in real measurements.
A detailed comparison against the real system led to the finding of a bug in the official Linux
SCL6 TCP implementation. It is responsible for the unexpected retransmissions (full details on
this bug reported in the official Red Hat issue tracker can be found at https://bugzilla.redhat.
com/show_bug.cgi?id=1203742). The simulation model was then tailored to mimic this TCP bug.
We think this is a salient example of a formal model (the DEVS TCP model ) serving as a point
of reference for a practical implementation (the Linux TCP code, rather intricate to review and
understand).
An important advantage of the simulation model is that it allowed for fine-grained analysis
(packet by packet if required). For example, link utilization and queue occupancy can be studied
and visualized in great detail in the simulation, while it is impossible to sample the instantaneous
evolution of queue occupancy at real network devices (e.g. to pinpoint queuing bursts that are
critical for TDAQ, and occur in less than 8ms).

4.4.3.2 Event Building Latency (Network/Application Level)


To gather real system metrics, the system experimental frame EFS is defined using a full rack of
TPUs, where the network traffic is largely determined via HLTSV assignment rate. With a 100kHz
rate for the HLTSV and 50 TPU racks (full farm), each rack should handle Events at 2kHz. Thus,
new experiments must sweep this parameter (θS = HLTSV rate), ranging from 50Hz up to 4kHz.
To simplify the analysis, a synthetic configuration was used: PUs accept Events 50% of the time,
Event size is 1.3Mbytes, and the DCMs use 500 credits.
Figure 4.18 shows simulation results for the average Event latency for increasing HLTSV as-
80 Chapter 4. Packet-Level Network Simulation

signment rates. When the HLTSV assigns Events at 50 Hz, latency is minimal (13ms) because
there is no sharing of resources and the network is completely free when applications start filter-
ing Events. For increasing assignment rates, latency rises as several PUs simultaneously request
Events competing for finite network resources and DCM credits. For rates above around 3.2kHz,
latency increases exponentially as the network approaches a bottleneck point (93% utilization).
Simulations are validated against the real system replicating previously conducted experiments
to sweep the HLTSV assignment rate parameter. 9 experiments were executed, each simulating 60
seconds (180,000 filtered Events in the most stringent case) in three different nodes, completing all
simulations in 120 minutes. Figure 4.18 shows the simulation results closely reproduce the latency
curve measured in the real system with a root mean square error (RMSE) of 63.708. The absolute
latency values and network load on the simulation differ from reality within an acceptable range:
less than 5 percent difference.
Averaged Event Build Latency (ms)

700
Real TDAQ system
600 Simulated TDAQ model
500
400
300
200
100
00 500 1000 1500 2000 2500 3000 3500 4000
HLTSV assigned event rate (Hz)
Figure 4.18: Average Event latency sweeping the HLTSV assignment rate (200 ROS, 1 TPU rack
with 40 DCMs, 960 PUs) comparing simulation and real measurements. Red and blue backgrounds
show standard deviation.

4.5 Exploring TDAQ Load Balancing Options Through Mod-


eling and Simulation
An Explore Cycle (green cycle in Figure 3.6) was executed to find potential emergent behaviors
using results provided by simulation results (λC ). Some information is not available in the real
system or is too difficult to gather with detailed granularity for post-analysis goals. On the contrary,
full system simulations generate huge volumes of information with any desired level of granularity.
4.5. Exploring TDAQ Load Balancing Options Through Modeling and Simulation81

Figure 4.19(top) is an example data analysis performed on the simulation results. It shows how
Events are distributed across the farm in different time slots using the HLTSV assigment policy
(FIFO). Other policies in the plot will be discussed later.
Simulations were configured with 250 DCMs (75 DCMs with 24 PUs and 175 DCMs with 24
PUs). The reddish area in Figure 4.19(bottom) shows that DCMs with more PUs receive higher
loads. All DCMs are heavily assigned in the first time bins. Another detected system-level behavior
is that individual DCMs differ significantly in the number of Events they process, i.e., the color
intensities vary noticeably along any single row and along any single column. These observations
led us to infer that a potentially uneven load-balancing mechanism might be the cause of overall
higher filtering latencies, and motivated deeper M&s-driven studies of load-balancing strategies.

Figure 4.19: Heatmap of the simulated load in the HLT farm for different HLTSV assignment
policies. Tile color represents the maximum amount of PUs simultaneously processed in each
DCM (230 DCM IDs in the vertical axis) in 0.5 s (5 s binned in the horizontal axis).

4.5.0.1 Load-Balancing in the TDAQ Network


As shown in Figure 4.20, a crucial element is the High Level Trigger Supervisor (HLTSV) node that
orchestrates and load-balances the assignment of each new Event into a single Trigger Processing
Unit (TPU) server among approximately 2000 TPUs available.
This is a highly sensitive task from the load balancing perspective, since the HLTSV must
assign Events complying with several constraints:
82 Chapter 4. Packet-Level Network Simulation

• it should be fast enough to avoid buffer overflows at the ROS layer

• it should distribute Events fairly among TPUs to avoid overloading their resources

In this context, an important goal is to verify, through simulation studies, different load bal-
ancing strategies guided by theoretical insights.

Incoming Events
λ=100kHz

Load Balancer Load Distributed


Strategy n=~2000 server

Parallel processing Capacity


θ=8-24 events

Figure 4.20: Load-Balancing view of the HTL system.

4.5.1 Load-Balancing Model and Studied Strategies


A set of n processor sharing servers is considered, with speed 1 and buffer size θ, receiving jobs
according to a Poisson process of rate proportional to n λ = ρn, with ρ ∈ R. The job size
distribution is assumed generic with a finite first moment. x = (x1 , . . . , xn ) refers to the number of
jobs at each of the n servers. A dispatcher routes each incoming job to one of the servers according
to a given load balancing strategy. λi (x) denotes the arrival rate at the i-th server describing the
load balancing strategy. When the dispatcher sends an incoming job to a server with xi = θ active
concurrent jobs, the new request gets blocked, i.e. it is rejected from the system. Hence, the state
space of the number of jobs in the system is finite and the system is always stable.
We denote with Bnθ the blocking probability of a system with n servers of capacity θ each (i.e.
a maximum of θ jobs can be served simultaneously). The delay is the service time spent by an
arbitrary job within the system.
Load balancing policies can be grouped into centralized and decentralized strategies. Cen-
tralized load balancing refers to policies having full information (about the number of jobs in
4.5. Exploring TDAQ Load Balancing Options Through Modeling and Simulation83

each server) available at the dispatcher. Decentralized load balancing, on the contrary, refers to
policies where the dispatcher manages only partial (or local) information.
Usually, centralized policies yield better service time and blocking probabilities while decen-
tralized schemes minimize the utilization of communication channels between the processors and
the dispatcher (which have limited capacity in real world systems).
The following schemes were considered in our study:

• Join the shortest queue (JSQ) This centralized strategy dispatches to one (of possibly
many) shortest queues, breaking ties at random. It is optimal for a wide class of job size
distributions but is also known not to be optimal for size distributions with high variance
(see [155] and references therein).
• Insensitive load balancing (ILB) This centralized strategy has been extensively studied
in [136], [139] and more recently in [142]. It has the desirable property to be insensitive to
the job size distribution, i.e., the stationary measure of the number of concurrent jobs in
each server depends only on the first moment of the job size distribution. An incoming job
is routed to server i with the following probability:
θ − xi
pILB
i (x) = Pn .
j=1 (θ − xj )

This load balancing rule was proved to be optimal (in the sense that it minimizes the blocking
probability for any convex criterion) in the set of insensitive load balancing for a single class
of traffic in [136].
• Join the idle queue (JIQ) This partially centralized strategy uses only as state information
whether each server is idle or not (and dispatches at random otherwise). It is hence a first step
towards decentralization while its efficiency is potentially much better than fully decentralized
schemes.

• Random (RND) This completely decentralized strategy uses no information from the sys-
tem and chooses at random a single server for each new incoming request.
• Power of D (PoD) This partially decentralized strategy corresponds to choosing at random
a subset of d servers among n and then send to the shortest queue within this subset.
• First-Finished-First-Assigned (FFFA) This centralized strategy is the one currently
implemented in the TDAQ farm. It imposes a small computing effort on the dispatcher.
It assigns new jobs to servers in the same order in which they finish processing jobs. The
bootstrap assignment starts with nθ random (unique) assignments.
• Centralized Random (CRND). This strategy uses as its only state information whether
each server is fully busy or not (and dispatches at random otherwise).

Remark 1 (The particular case θ = 1) When θ = 1 all centralized policies coincide, and the
system corresponds to the M/M/n/n queue. Measures such as the mean delay and the blocking
probability can be explicitly calculated and the critical regime corresponds to the well-known
Halfin-Whitt-Jagerman regime.
84 Chapter 4. Packet-Level Network Simulation

4.5.1.1 DEVS Model Implementation


The load balancing system described above was implemented in PowerDEVS so that new models
can be readily plugged into the TDAQ network model to test different policies in varied TDAQ
scenarios.
Libraries were expanded to incorporate new load-balancing models making them available
for any other modeled application. Figure 4.21 (bottom) shows the high-level view of the load
balancing model as implemented in PowerDEVS.

In1 ? finishedID
Out0 ! assignment

strategy-> In0 ? newJob


setProcessorIdle(finishedID)
assignment.procID = strategy->getNextProcessor
assignment.job = newJob

new jobs discards


In0
assignments
In1 Out0
----
Job
Generator Dispatcher Processor Sharing
finished Servers

Figure 4.21: Load balancing model implementation in PowerDEVS (bottom) and Dispatcher model
state machine (top).

The JobGenerator model follows parameterizable probabilistic distributions to generate new


jobs with desired sizes and send rate λ. New jobs are sent to the Dispatcher model that behaves
according to a DEVS Graphs diagram [156] depicted in Figure 4.21 (top right). Upon receiving
a new job, an external state transition is triggered by the NewJob arriving at the input port
In0. This transition brings the model from the Wait state to the SendAssignment state. The
Dispatcher will remain at the Wait state forever (its autonomous time advance is infinite, depicted
as ta=INF ) unless an external message arrives. Conversely, the lifetime of the SendAssignment
state is zero (depicted with ta=0 ), meaning this is an instantaneous state, which will undergo
an internal transition immediately. The Dispatcher relies on a load balancing strategy to decide
which processor will handle each new job. This decision is made during the external transition
(solid arrow), while the message with the decided assignment is sent out during the instantaneous
internal transition (dotted arrow, back to Wait) through the output port Out0.
The Dispatcher also receives events through the port In1 whenever a server finishes processing
a job, in which case it notifies the ID of the processor to the strategy. All strategies described in
the previous section are implemented following a single class hierarchy (as shown in Figure 4.21,
4.5. Exploring TDAQ Load Balancing Options Through Modeling and Simulation85

top left) and they all implement a common IDispatcherStrategy interface, which decouples the
Dispatcher and strategy logics. This modeling approach facilitates the addition of new strategies,
which should only implement the common interface, without changing the Dispatcher logic nor
the rest of the system.
The Processor Sharing Servers model uses Vectorial DEVS [71] to create automatically N
instances of a same DEVS model (depicted with a green border), possibly with different values for
its parameters. Each server model identified by its index i, has a parameterizable finite capacity
θi and processing power ci (the latter fixed all to 1 in this work).
When servers receive an assigned job from the Dispatcher the current number of jobs being
processed xi is checked: if it exceeds θi the job is discarded, otherwise the job is accepted and
processed.

4.5.2 Simulation Results


4.5.2.1 ILB policy for Finite Servers
Simulations for the ILB policy are performed to characterize how the system performs with a
finite number of servers. These are compared against the closed form expression of the blocking
probability for the scale-limit limiting system (n → ∞) [142]:
Theorem 1 For a ∈ (−∞, ∞), let
1
nρ = n + an θ+1 . (4.2)
Then,
∞ −1
u(θ+1)
Z  
θ
lim Bnθ n θ+1 = exp au − du . (4.3)
n→∞ 0 (θ + 1)!

In the sequel, the normalized blocking probability is defined as:


θ
B̄nθ = Bnθ n θ+1

Figure 4.22 shows the ILB blocking probabilities for systems with different number of servers,
with the case n = ∞ representing the theoretical limiting formula (4.3). Note that the x-axis
depends on the a parameter which is a handy link to the load nρ according to formula (4.2) that
facilitates looking at limiting values when n → ∞.
Figure 4.22a shows results for high loads, where the blocking probability of the simulated
systems get closer to the theoretical system as n increases. Note that the expected asymptotic
blocking probability for large ρ (Bnθ = 1) is not predicted by the formula (which shows that the
limits in n and ρ do not commute here). For light loads, the formula predicts quite well the
blocking probabilities for all systems (independently of the number of servers).
Figure 4.22b is a close-up view around the critical load a = 0 with a log scale in the ILB blocking
probability. The figure shows that the theoretical formula predicts very well the inflection point
for the ILB blocking probability in the critical regime. The formula precisely predicts (B̄nθ = 0.5)
for all systems (independently of the number of servers) in the critical load (a = 0). For loads in
the vicinity of a = 0 the theoretical predictions get precise for n > 100 and deteriorate for large
|a|.
86 Chapter 4. Packet-Level Network Simulation

In particular, for critical loads lower than a = −5 the theoretical formula gets a significant
bias and predicts much lower blocking probabilities than those obtained with simulations. It
is interesting to observe that all simulated systems start their phase transition (passing from
an exponentially small blocking to a polynomially small blocking) at approximately the same
normalized load (at a ' −5) and with approximately the same normalized blocking probability
(B̄nθ ' 10−6 ). As expected, this is not predicted by the asymptotic formula (4.3) (which does not
depend on n).

4.5.2.2 Performance Comparison for Different Policies


A performance comparison for different policies is presented in Figures 4.23a and 4.23b for blocking
probabilities and mean service time, respectively. All strategies present small blocking probabilities
when compared to the random strategy, and all start increasing quickly after ρ > 1. Regarding
the mean service time, CRND shows the worst processing times. JIQ, JSQ and Power-of-20 are
the best performing policies and show a close-to-optimal delay with low loads and a rapid increase
in the critical regime. The rest of the policies dwell in the middle with a softer increase of the
delay for higher loads. Under heavy traffic conditions (ρ ≥ 1), all policies behave almost similarly
with delays close to the maximum given by µθ. The Random policy shows an interesting (perhaps
surprising) trade-off between a reasonable delay and a (very) high blocking probability.

4.5.2.3 Critical Regimes for Efficient Policies


Simulations are performed to characterize how different policies perform compared to the theoret-
ical closed form expression for the ILB blocking probability.
We define Class C policies as:
θ
lim Bnθ n θ+1 = κ(a) ∈ (0, ∞).
n→∞
−θ
For policies of Class C the blocking probability is exactly of order O(n θ+1 ) for large n, with
κ(a) the corresponding constant depending of the normalized load a. By definition, the ILB policy
belongs to C.
This leads to a convenient rule of thumb for Class C: choose n servers for an incoming
1
load of n + an θ+1 (given by the formula (4.3) ).
One way to check if a policy belongs to C is to plot the normalized blocking probability and verify
whether it transitions from very small values to O(1) as a changes. The simulations in Figure 4.24
show the normalized blocking probability and evidences that there indeed exist consistent classes
of policies that do share the same phase transition (i.e. same critical load depending on n and θ).
Also, various policies with less efficiency do not share the same characteristics. For those efficient
policies, the dimensioning rule of thumb could be applied.
These observations suggest a validity of the generalization of the Halfin-Whitt-Jagerman scaling
for a large class of efficient policies.
Also, the results in Figure 4.24a (normalized blocking probability for IBL as compared to other
strategies) indicate that the theoretical formula (4.3) could be applied in practical scenarios as an
estimate of low/high bounds for the blocking probability. For example, in the TDAQ case study,
IBL exhibits blocking probabilities very similar to FFFA, and it is expected not to be far from the
theoretical closed formula (easier to calculate) for a large number of servers (n ∼ 200).
4.5. Exploring TDAQ Load Balancing Options Through Modeling and Simulation87

Finally, in view of the simulation results we can propose the following conjectures:
Conjecture 1 The JSQ policy, the JIQ policy and the FFFA policy belong to C.
Conjecture 2 The Power of D policy does not belong to C for any fixed D (i.e., not depending
on n).

Conjecture 3 There exists d(n) ≤ n such that the Power of D(n) policy belongs to C.
88 Chapter 4. Packet-Level Network Simulation

(a) IBL blocking probability on high loads

(b) IBL blocking probability on low loads

Figure 4.22: Normalized blocking probability for IBL scheduling with different number of servers.
Theta=3, Service=exp(1). Bars represent the standard deviation.
4.5. Exploring TDAQ Load Balancing Options Through Modeling and Simulation89

(a) Normalized blocking probability for different strategies

(b) Mean service time for different strategies

Figure 4.23: Comparison of different strategies. Theta=3, Service=exp(1). Bars represent the
standard deviation.
90 Chapter 4. Packet-Level Network Simulation

(a) Normalized Blocking Probability

(b) Mean Service Time

Figure 4.24: Performance of different strategies in the critical regime. N=200, theta=3, ser-
vice=exp(1).
4.5. Exploring TDAQ Load Balancing Options Through Modeling and Simulation91

4.5.2.4 Job Size Sensitivity Analysis


We characterize via simulations the impact of different job size distributions compared to the ex-
ponential distribution assumed by most theoretical analysis. A generic conjecture for processor
sharing servers (and more generally, for any symmetric scheduling) is that all the limits are insensi-
tive to job size distribution as n → ∞. But it remains an open problem to quantify the sensitivity
for a fixed n.
Figure 4.25 compares the performance of several policies for different job size distributions
(with mean equal to 1 in all cases). Figure 4.25b shows that all policies have a service delay
that is almost insensitive to the job size distribution. Regarding the blocking probability, IBL
and Random are, as expected, fully insensitive while other policies show a slight difference across
distributions. Particularly in the case of 2-valued distributions, most policies perform much better.
Simulations showed that sensitivity is limited in all policies for n ≤ 50, and almost nonexistent
for n ≥ 50. These results suggest that for medium to large systems (n > 50), dynamic load
balancing schemes are robust to the statistical variation in the service distribution, at least for
blocking probabilities.

(a) Normalized Blocking Probability. (b) Mean Service Time.

Figure 4.25: Sensitivity of load balancing strategies to service time distribution

4.5.3 Real System Improvement Proposal


Behaviors discovered in the explore cycle from the previous section motivated us to run an hy-
pothesis cycle (orange cycle in Figure 3).
First, new load-balancing algorithms are tested in the TDAQ simulated domain. Later, in view
of improved performance shown by the simulation, the same changes are performed in the real
system as suggested by the M&S methodology. Finally, the real system results match accurately
the simulation predictions.

4.5.3.1 Testing the Hypothesis on the Model


The latency’s linear increase in Figure 4.18 is the effect of several PUs competing for the network
and DCM credits (reddish tiles in Figure 9).
92 Chapter 4. Packet-Level Network Simulation

The proposed hypothesis is: Under an improved assignment policy the sharing of resources
should be reduced with the direct consequence of improving the overall event filtering latency.
Guided by results in the previous sections, JSQ is chosen as the optimal policy. In the sequel,
JSQ is also referred to as LEAST_ BUSY_DCM which is a more meaningful name under the
TDAQ context.

Simulations are performed to compare the CENTRALIZED_RANDOM algorithm, the FFFA


algorithm implemented in TDAQ, and the proposed LEAST_BUSY_DCM algorithm. The same
experiment as in Section 4.4.3.2 is performed (sweeping the HLTSV rate) but configured with 9
TPU racks (267 DCMs), 24 PUs per DCM (total 6408 PUs), 87 ROS servers with 20 channels each,
fragments with fixed size of 1Kb, and 1740 fragments per event (Event size=1740K). Additionally,
PUs are configured to have an acceptance rate of 50% and do full Event building (i.e. Events are
not requested in several small batches)
Figure 4.19 shows that LEAST_BUSY_DCM effectively balances the load of all DCMs in the
farm, reducing the amount of simultaneous PUs processing in each DCM (tile colors exhibit more
similarity along rows and columns).
Figure 4.26 shows simulation results with light dashed lines comparing the CENTRALIZED_RANDOM
algorithm and the LEAST_BUSY_DCM algorithm. FFFA is omitted because it eventually be-
comes equivalent to CENTRALIZED_RANDOM as shown in Figure 4.19.
The LEAST_BUSY_DCM algorithm maintains average Event latency close to a minimum
(16ms) for all frequencies below 24kHz. For higher frequencies, the latency grows exponentially
due to network congestion.
These results suggest that the new proposed algorithm could reduce latency between two to
four times for this specific configuration (design rate of 15kHz with a network saturation point of
23kHz).

4.5.3.2 Implementation and Validation in the Real System


After verifying the hypothesis with simulation, the next step is to implement changes to validate
against the real system.
It was possible to reuse some C++ code developed for the simulation models directly into the
real applications (e.g. the LEAST_BUSY_DCM algorithm). Some adaptations were necessary
to attain better performance in the real multi-threaded environment.
The same experiment was performed in the a controlled environment in the real system .
The HLTSV rate sweeping using nine TPUs racks and artificially generated Event data (syn-
thetic Events and random processing times in the PUs taken from previous real metrics). Figure
4.26 shows the result of comparing CENTRALIZED_RANDOM and LEAST_BUSY_DCM algo-
rithms in the real HLT network with solid lines. With the new algorithm and rates under 24kHz,
the average latency is kept to a minimum and shows improvements of two to four times compared
to the current FIFO algorithm, as predicted in the simulation.
On the other hand, the LEAST_BUSY_DCM implies more operations, and the HLTSV server
was able to sustain an Event rate only up to 75kHz.
Simulations predict DAQ network behaviour when applying changes in control logic of the
HLTSV load-balancing algorithm. The Root Mean Square Error (RMSE) between the simula-
tion predictions and the real measuremts are 3.24 and 15.97 for the LEAST_BUSY_DCM and
4.6. Conclusions 93

RANDOM policies, respectively. This shows that the model is capable of reproducing measurable
behaviors, and represents a valuable tool to provide insights on the impact of changes in the real
system.
Averaged Event Build Latency (ms)

200
Real LEAST_BUSY_DCM (new)
Simulated LEAST_BUSY_DCM (new)
150 Real RANDOM (current)
Simulated RANDOM (current)
100

50

00 5 10 15 20 25
HLTSV Event rate (KHz)
Figure 4.26: Comparison of assignment policies (RANDOM versus LEAST_BUSY_DCM) and
simulation predictions versus real metrics. Error bars represent standard deviation. The RANDOM
algorithm exhibits the same behavior as the FIFO algorithm, while the new algorithm maintains
average Event latency close to a minimum (16 ms) for all frequencies below 24 kHz.

4.6 Conclusions
In this Chapter the development of a packet-level simulation model under the DEVS formalism
was described.
New protocol, network and DAQ application models are now part of the reusable PowerDEVS
network library. The TCP protocol model was compared against the OMNET++ implementation
showing behaviours of acceptable similarity.
The modular construction of topologies was described, supported by graphical interfaces, and a
case study showed an effective automatic generation of a medium sized topology directly retrieved
from SDN controllers.
Real metrics from the TDAQ network were taken in two different case study scenarios and
compared against simulation results. These studies showed that simulations reproduce relevant
behaviour of DAQ network and applications, specifically, a Traffic-Shaping application and Event
Build Latency.
94 Chapter 4. Packet-Level Network Simulation

Then, load balancing strategies were studied through extensive simulations from the queuing
theory perspective. Simulations showed that there is a class of efficient policies for which a common
critical regime can be identfied and interpreted as a generalization of the Halfin-Whitt-Jagerman
regime for one-server systems. This can provide new insights for future theoretical studies.
These finding motivated the implementation of an alternative load-balancing strategy to dis-
tribute Events among TDAQ processors. Simulations showed improvements in the overall Event
filtering time. In turn, this improved load-balancing strategy was implemented in the HLTSV
application and tested in a controlled environment within the real TDAQ network. Measurements
from these tests showed that simulation predictions matched accurately with the real measure-
ments.
This Chapter offered evidences for the suitability of our new packet-level library to represent
complex high-speed networks, while the next Chapter will focus on performance scalability relying
on fluid approximations.
All packet-based discrete simulators scale, in one way or another, at least linearly on the size
of the system. And so does our discrete approach.
But, how does the performance of our DEVS-based library compares against other simulation
tools?
Our assessment is that the order of magnitude of our packet-level simulation times lie within
expected ranges shared with other tools. To provide an intuition, simulations in PowerDEVS
for the models presented in Chapter 4.4.3 resulted in comparable execution times with similar
scenarios using the OMNET++ tool reported [96]. The order of magnitude is ca. half a minute
per each virtual simulation second for 1/50 of the full HLT network.
It remains as a broad area of research to find techniques to automatically optimize DEVS-
based packet-level models (i.e. beyond the usual code reviews for performance enhancements). For
instance, by automatically reducing the hierarchy complexity (model flattening, see e.g. [157]).
Such symbolic manipulation is only possible by relying on a formal model specification, as it is the
case with DEVS.
Resumen: Simulación Paquete a Paquete

En este Capítulo se describe el desarrollo de un modelo paquete a paquete bajo el formalismo


DEVS y se valida contra otros simuladores y escenarios complejos del mundo real.
En una primer etapa, se desarrollan nuevos modelos de protocolos de red y aplicaciones DAQ
como parte de la biblioteca reutilizable de PowerDEVS (ver Figura 4.9). El modelo para el pro-
tocolo TCP se compara con la implementación de OMNET++ exhibiendo comportamiento y
similitud aceptable (ver Figura 4.7 y Tabla 4.5).
Luego, se describe la construcción modular de topologías, tanto mediante interfaces gráficas
como a través de la generación automática o programática (consulte la Sección 4.3.3.2). Se pre-
senta en un caso de estudio para una topología real de tamaño mediano tomada directamente de
controladores SDN.
Posteriormente, se describen los modelos para las aplicaciones TDAQ que, junto con los modelos
de protocolos, se utilizan para representar el flujo de datos de la red de filtrado (consulte la Sección
4.4). Se toman métricas reales de la red TDAQ en dos escenarios diferentes y se comparan con los
resultados de simulación: una aplicación para el control de tráfico y ;la caracterización de la latencia
de filtrado de eventos. Ambos demuestran que las simulaciones reproducen el comportamiento
relevante de la red y de las aplicaciones TDAQ (ver Figuras 4.17 y 4.18).
Finalmente, desde la perspectiva de la teoría de colas, se estudian estrategias de balanceo de
carga utilizando simulaciones extensas. Las simulaciones sugieren la existencia de una clase de
políticas eficientes para las que un régimen crítico común puede identificarse e interpretarse como
una generalización del régimen de Halfin-Whitt-Jagerman para sistemas de un servidor. Esto
puede proporcionar nuevos conocimientos para futuros estudios teóricos (consulte la Sección 4.5).
Estos hallazgos motivaron la implementación en modelos de simulación de una estrategia al-
ternativa de balanceo de carga para los procesadores TDAQ. Las simulaciones indican mejoras
en el tiempo total de filtrado. Por consiguiente, la estrategia de balanceo de carga mejorada se
implementa en la aplicación real HLTSV y se prueba en un entorno controlado. Las mediciones
comprueban que las predicciones de simulación coinciden con las mediciones reales con buena
precisión (ver Figura 4.26).

95
Chapter 5

Fluid-Flow Network Simulation

La scienza è il capitano, e la pratica sono i soldati.

Leonardo da Vinci

5.1 Introduction and Motivation


Packet-level simulation, discussed in the previous Chapter, represents an important tool to help
designing and evaluating new network topologies and protocols. Yet, it is well known that there
exist issues in simulation performance scalability when the complexity of the network grows (either
due to topology complexity, throughput intensity, or a combination of both). Such issues often
impose a limitation in the quality and/or time-to-delivery of the answers that can be obtained via
simulation.
Simulation execution time in packet-level simulations is directly proportional to the number of
packets that traverse the network per unit of time and the number of traversed nodes in a topology.
As network technologies evolve packet-level simulations will face stronger performance limitations.
In Figure 5.1, the actual and future Ethernet speeds show an exponential growth in terms of the
bandwidth [41]. Network topologies also grow fast, from current massive clusters to grids and the
Internet itself.
The ATLAS networks are not an exception. One can refer to the LHC schedule 3.2 where the
peak and integrated luminosity is planned to increase steady at least until 2035. Luminosity in
the detectors is related to the amount of data generated and ultimately to the throughput that
will be required for the triggering networks.
This situation increases the gap between the performance capabilities of current network sim-
ulation techniques and real-world networks. While packet-level simulations can cope with current
small-to-medium sized networks, we can expect issues in simulating very large-scale topologies
and/or very high-speed networks.
The network modeling and simulation community has dealt with simulation performance in
several different ways [2], such as simulation parallelization, coarse- vs. fine-grained models, fluid-
flow vs. packet-level abstractions, and hybrid (fluid and packet) simulation approaches [23], to
name a few. Each strategy imposes its particular limitations and bring about new problems. For
instance, topology splitting and model distribution for parallel simulation can offer gains only in

96
5.1. Introduction and Motivation 97

Figure 5.1: The Past, Present, and Future of Ethernet [41].

cases where there is light inter-subnetworks traffic, meanwhile clock synchronization techniques
can also present technical difficulties [158].
Fluid-flow models, briefly introduced in Chapter 2 (and the focus of this Chapter) propose a
higher abstraction level to represent averaged (fluid) packet data rates, instead of resorting to a
packet-by-packet approach. These analytical models can be much faster to solve numerically and
yield results with acceptable accuracy when compared to fine-grained packet-level simulations. On
the hand, there inevitably exists a trade of accuracy for speed: in fluid approximations only the
first order moment (mean value) is preserved for probabilistic sequences of discrete events (i.e. of
packet arrivals and departures from network nodes).
Several fluid-flow approximations have been proposed and refined incrementally since the early
days of packet network modeling [42]. We are interested in fluid models that are represented by
sets of Ordinary Differential Equations (ODEs). They have been successfully applied to study
complex end-to-end dynamics in TCP data flows, initiated in [43] and [24] continued in this work.
We identify, though, several limitations in the existing methods and tools for ODE-based fluid-
flow network simulation. At the heart of the fluid approximation approach is the need of verifying
results against packet-level simulations. Unfortunately, existing methods and tools for network
simulation are of a very different nature than those required to solve ODEs. In network model-
ing the natural approach is to compose topologies modularly, through the interconnection (links,
queues) of network nodes (hosts, routers, etc.) each one embedding particular behavior (discrete
event algorithms) to deal with streams of packets. In ODE numerical solving, different tools (e.g.
Matlab, Octave, SciPy, etc.) provide algorithms that require the specification of equations in forms
that are alien to network descriptions. The modeler is then left with the task of implementing
98 Chapter 5. Fluid-Flow Network Simulation

a packet-level model, inspect the network topology and discrete behavior of its nodes, come up
with a set of approximating ODEs, encode them in a separate tool (or develop a custom ODE
solver), simulate both systems separately, and compare results. We claim that this approach is
heavyweight, error prone, and hinders true synergy between specialists at the discrete and con-
tinuous domains of network M&S. The network modeler needs to be well acquainted with ODEs,
numerical solving methods and their correct implementation. These are well studied topics but
are generally not part of the knowledge of network experts and designers. Also developing and
maintaining customized code is time-consuming and error-prone for ODE experts.

In this Chapter, we propose novel methods for efficient modeling and simulation for fluid-flow
network approximation.
We present a modular and scalable integrated approach to combine the modeling and numeri-
cal solving of fluid network models along with their packet-level counterparts under a unified and
consistent mathematical description and practical tool. Modularity provides the modeler with the
ability to graphically interconnect self contained models of network elements that can embed either
a packet-level algorithm or its fluid-flow approximation, depending on the task at hand. In the
case of fluid models, the overall set of ODEs gets automatically defined and ready to solve under
a discrete event-based framework. Basic network nodes can then be reused to create arbitrary,
possibly complex topologies, without the need to manually redefine a new set of ODEs for each
new simulation scenario.

One step further is to consider hybrid models, where fluid-models are integrated and inter-
act with well-know packet-level simulators. These models report the performance advantages of
fluid-models while preserving detailed results of packet-level models. On the other hand, it posses
the challenges of the packet-level simulation together with the need to master fluid-flow ODE and
numerical solvers. Moreover, the time management of discrete-event simulators and discrete-time
time solvers must be synchronized.

In Section 5.6 we propose a novel hybrid approach that takes advantage of representing packet-
level and fluid-flow models under a same formalism. Consequently, there is no need to synchronize
time as both models, fluid and packet, are represented as discrete-events of a same class.

5.2 Preliminaries and Related Work


5.2.1 Fluid-flow Network Simulation
In Section 2.1 the fluid-flow simulation approach was briefly introduced. We review here fluid-flow
models available in the literature and present details for those that inspired new models for this
Thesis. The Chapter ends with a comparison Table 5.8 of available fluid-flow models and our
original contributions.
The underlying purpose is to obtain sufficiently approximated results with simulation speedups
reaching orders of magnitude when compared to packet-level models.
A first step in this approach is to assume multiple contiguous packets equally spaced in time
and model them as a single packet-train [159]. Another approach is to abstract away small time
5.2. Preliminaries and Related Work 99

variations and consider multiple packets as a constant fluid rate [160] allowing buffer size to be
calculated analytically.
FluidSim [26] proposes a model for ATM networks following an event-driven approach in which
events are associated only with rate changes in fluid flows. A custom scheduling algorithm is
proposed to handle event types specific for this model, and a sort of integration is done for piece
wise constant throughput rates.
A widely referenced model in the literature, which we use as baseline for our work, is the one
proposed by Misra, Gong, and Towsley in [44]. We shall hereafter refer to it as the MGT model or
the MGT study. A set of ODEs is developed relying on a Stochastic Differential Equation (SDE)
analysis to describe the behavior of TCP on a network of routers implementing Random Early
Detection (RED, 4.2.1).
The first versions of the MGT were empirically validated with real network traffic [43]. More-
over, it was shown that when the network is scaled to infinity fluid models capture the limiting
behavior of TCP and follow the sample path behavior [161]. MGT was later extended and other
versions of TCP (NewReno, SACK) were proposed in [24]. This triggered several research efforts
including short-lived TCP sessions in [47], explicit congestion notification (ECN) in [46], and paral-
lelization techniques using GPUs [10]. In [81] an early version of the MGT model was implemented
using QSS, where only a single flow class is considered and equations describe the dynamic of a
single queue that cannot be connected in tandem.
For the MGT, the ODEs were originally solved in Matlab and with later additions using
fixed-step discrete-time Runge-Kutta (RK) algorithm programmed in C [24]. The custom RK
solver includes ad-hoc hooks to handle time delays present in the equations. Figure 5.2 shows
the simulation flowchart used for this particular set of ODEs, which specifies the order in which
variables’ values are updated. During the initialization phase a model reduction procedure takes
place to remove inactive nodes from the topology, which greatly reduces the computation time as
less variables must be computed at each discrete-time step.
Although fluid-flow models are feasible to simulate accurately and efficiently large data net-
works, they are still far from being adopted by network experts.
On the one hand, developing correctly (and even understanding) ODEs that model real world
dynamics requires strong mathematical background. On the other hand, a set of ODEs representing
a full network gets defined by a combination of the topology (a macro property) and the dynamics
of each node (a micro property) living in that particular topology. Furthermore, once ODEs are
obtained, they must be re expressed into formats specific to an existing numerical solver of choice.
Alternatively, the network modeler ends up faced with the task of coding himself an ad-hoc ODE
solver, where several new risks and error-proneness arise.
Moreover, for experts in ODEs and numerical methods, the effort to develop and test new
network models is twofold. On the one hand, to solve ODEs they need to code and maintain
software to translate topologies into equations, possibly implement ODE-solvers for particular
challenges as Delay Differential Equations (DDE) (we shall see this in detail in Section 5.2.3), and
in some cases custom simulations cycles are developed. On the other hand, to verify new models
they still need to develop packet-level models in a completely different software context.
100 Chapter 5. Fluid-Flow Network Simulation

Figure 5.2: Flowchart of Fluid Model Solver for the MGT model in [24]

5.2.2 Hybrid Network Simulation


There is also some related literature for hybrid models, those that merge packet-level and fluid-flow
models.
In [25], the authors propose equations for the different phases of the TCP protocol and of queue
states. A transition from one discrete state to another is generated by state events (conditions
on variable’s values). Numerical solvers available for the Modelica language were used for the
simulation. The authors found that the simulation time grows essentially linear with the number
of discontinuities in the continuous variables, which is proportional to the drop-rates.
FluNet [162] proposes a fluid-flow model were queuing dynamics are not tracked but instead
replaced by an equivalent rate-based model. This is justified by the fact that queue lengths fluctuate
on a faster time-scale than end hosts, and a discrete-time solver would require decreasingly smaller
step-sizes as the system size scales up. For the hybrid simulation, a portion of the network is
5.2. Preliminaries and Related Work 101

simulated with packet-level (ns-2 simulator) and some routers use fluid-flow (FluNet). At the edges
(packet-level/fluid-flow) ingress interfaces average packet arrivals and forward rate information into
FluNet. Egress interfaces apply fluid metrics to discrete packets.
The authors of the MGT model proposed to integrate the fluid-flow model with the ns-2
packet-level simulator [9]. The network is split by placing nodes either in a fluid-flow subnetwork
or in a packet-level subnetwork. For the interaction of both type of simulations, two incremental
approaches are proposed. They refer to the "one-pass model" were fluid metrics of background
traffic are retrieved from the Runge-Kutta solver and applied to discrete packets to influence
foreground traffic. In a "two-pass model" an additional pass is required to transform the packet-
level traffic into a fluid-flow representation, then solving the resulting fluid model.
The fluid solver is synchronized to the ns-2 simulator periodically every a fixed smoothing
internal which must be chosen smaller than the minimal propagation delay of links that are the
last hops on the packet paths inside the fluid network. Experiments show correct hybrid simulations
reaching speedups of up to 6.53 as compared to the pure packet-level simulation.
In [10], the authors combine the foreground packet-level discrete-event simulation on CPUs
with background fluid-flow numerical calculations on GPUs. Authors propose a "fix-up computa-
tion" as an approximation method for mixing fluid and packet flows within a fixed Runge-Kutta
interval. They accept that the approximation may introduce errors due to possible queue overflow
events which are expected not to be significant when using sufficiently small Runge-Kutta step
sizes (which can be seen as a convergence towards the basic Euler 1st order method). Experiments
show that the GPU-assisted hybrid model can achieve substantial performance improvements over
the CPU-only approach, while still maintaining good accuracy.

In all cases, proposals maintain separated simulation techniques to represent fluid-flow and
packet-level simulations, namely discrete-time and discrete-events, respectively. This leads to
the need of implementing clever ad-hoc synchronization algorithms to make the different time
management systems match, such as constraints in choosing step sizes, smoothing of packet-level
traffic, and handling possible approximation errors. Special considerations must be enforced to
synchronize correctly the simulation clocks of the discrete-event packet simulator and the time-
stepped fluid solver. Moreover, different tools must be maintained for both discrete/continuous
domains, which need to be mastered for each approach (namely, packet-level simulators and ODE
solvers).
An exception is the work presented in [81], a predecessor of the work in this Thesis, where
DEVS is used to represent both packet-level and fluid-flow models under a same formalism and
tool (using QSS to approximate ODEs). For the interaction of fluid and packet models, the packet-
level queue accepts a new type of hybrid packet structure for each new discrete packet. Hybrid
packets contain the original packet size plus an artificial hybrid size that represents the number of
fluid packets that should be interspersed in between adjacent discrete packets. The service time is
calculated based on both sizes, thus effectively influencing queuing wait time with fluid metrics.
Under the assumption that background throughput is much larger than foreground traffic, fluid-
flows can affect packet-level flows. Yet, the inverse option is not available. Also, as mentioned
before, the fluid model can only represent a single overall background flow, and does not allow for
hybrid queues to be connected to create fluid topologies.
102 Chapter 5. Fluid-Flow Network Simulation

5.2.3 Delay Differential Equations


Delay Differential Equations (DDEs) are a particular case of ODEs where the derivative of a state
variable depends on current but also on past values of the state variable itslef.
DDEs play an important role in several models for diverse domains such as neuroscience,
demography, disease spreading, etc., and also in network dynamics. In [80] a novel numerical
method was recently developed in order to solve DDEs with the QSS methods, leading to the
Delay QSS (DQSS) algorithms. DQSS was then used in [81], [163] to simulate models of network
dynamics such as the well-known fluid approximation of TCP flows in [44]. We shall revisit this
model in this Thesis.
Yet, as we shall soon see, DDEs are not enough to describe certain network dynamics we will
need to generalize the fluid modeling of basic Buffer-Server systems.
A general DDE structure for a state variable x(t) can be expressed as follows [164]:

dx
(t) = f (t, x(t − τ1 ), ..., x(t − τn )), t ≥ t0 (5.1)
dt
DDEs can be categorized depending of the dynamics of the delay as follows:

1. Constant delays: τi = Ki

2. Delays dependent on time: τi = τi (t)

3. Delays dependent on time and the state itself: τi = τi (t, x(t))

DQSS methods can simulate all these types.


Just to provide a quick intuition, think of the derivative of a flow (bits per second) injected by
a controlled sender. The control tries to prevent causing congestion, and will depend on signals
reporting on the state of the network. Said signals arrive after a transmission delay, which in turn
can be a function of the load of the network produced by the sender itself. In this setting, x(t) can
be the flow injected by the sender and its derivative can depend on a function dx dt
(t) = f (x(t − τ )),
where x(t − τ ) represents the flow injected in the past (which justifies the delay perceived τ units
of time later).

5.2.4 Simulation of Discontinuous Systems


Fluid modeling of data network queues entail particular challenges to accurately simulate the
limited buffer capacity. We present here the challenges as a more general topic in the integration
of discontinuous systems.
Classic numerical integration algorithms are based on Taylor–Series expansions (see Equation
(2.6)). Simulation trajectories are approximated by polynomials in the step size h around the
current time t∗ . This causes problems when dealing with discontinuities since polynomials never
exhibit such behavior. Since the step size is finite, the integration algorithm does not recognize an
intermediate discontinuity as such.
A typical textbook example of discontinuous model is the bouncing ball. The following equa-
tions consider a free falling ball when it is in the air (x(t) > 0, sw = 0), and when it touches the
ground (x(t) = 0, sw = 1) it follows a spring–damper behaviour.
5.2. Preliminaries and Related Work 103

ẋ(t) = v(t) (5.2)


1
v̇(t) = −g − sw (t) (kx(t) + bv(t)) (5.3)
 m
0 if x(t) > 0
sw = (5.4)
1 otherwise
where m is the mass of the ball, g is the gravitational constant, b is the damping constant, k
is the spring constant.
Figure 5.3 shows the simulation result when using Runge-Kutta with different step sizes and
model parameters m = 1, b = 30, k = 1e6 , and g = 9.81. All simulations match until the
first bounce and then notable differ. Another visible error when using h = 0.002 is that the 7th
bounce is higher than the previous one. The problem in these results comes from the fact that
simulations integrate through the discontinuity that occurs when the ball touches the ground.
Simulations assume a continuous function between instants tk and tk + h when actually at some
point tk < t∗ < tk + h the function abruptly changes from a free fall to a spring–damper dynamic.

Figure 5.3: Bouncing ball simulation result using Runge-Kutta with step sizes h = 0.002, h = 0.001,
and h = 0.0005 [165]

Variable step integration methods are often able to pass through discontinuities in the quite
an inefficient way. Step size control algorithms will detect sudden very steep gradient changes
and will consequently reduce the step size. The step size is made smaller and smaller until the
algorithm finally gives up, either because the step size reached smallest tolerable value or because
104 Chapter 5. Fluid-Flow Network Simulation

the step–size control is getting fooled. As a consequence, the discontinuity is passed through with
a very small step size, steep gradients disappear, and the step size is cautiously increased again .
Although inefficiently, usually decent results are produced but in many cases this is not the case
[5].
If the step size is not sufficiently small discontinuities can be even more problematic for discrete-
time methods. Figure 5.4 shows the results using a variable–step Runge–Kutta method (ode45
of MATLAB) with two different accuracy settings to depict the problem. Using a sufficiently
large tolerance value, the method totally skips the discontinuity and is not able to detect the ball
touching the ground.
A classical way to avoid these situations is to use a rather simple idea: all that is needed
is a variable step method which performs a step at the exact instant t∗ where the discontinuity
occurs. This way, the method would always integrate a continuous function before t∗ and another
continuous function after t∗ .
For time events, in which the instant at which the discontinuity will occur is known in advance,
this idea works in classical numerical integrator without much changes: known times at which
discontinuities occur are scheduled and whenever t + h >= t∗ the method is setup to restart with
simulation time t∗ instead of continuing with t + h.
On the contrary, for state events the instants at which discontinuities occur are a function of
the state variables and thus can not be know ahead of time. This is the case of the bouncing ball
problem, and the idea of executing a step at the exact instant t∗ where the discontinuity occurs is
not so simple. Functions that determine discontinuities must be evaluated at every time step. If
a discontinuity condition is detected the method should start an interactive procedure to find the
instant t∗ backwards in time with a given precision. Once the iteration finishes, the method can
restart from the time which was found.
These techniques for handling discontinuities in discrete-time methods incur an important
computational cost. In systems where discontinuities occur faster than the continuous system
dynamics, simulation times can become prohibitive.
Coming back to data networks, similar issues were reported in [10] in the context of buffer size
approximation in hybrid simulations were discontinuities occur at buffer underflow and overflow.
Given the queue length approximated with ODEs, the simulation needs to determine whether a
packet will cause the network queue to overflow. The fluid model only calculates the differences
in queue length at each Runge-Kutta step with size h and is assumed to stay constant in between
intervals. This approximation may introduce important errors.
Figure 5.5 shows an example on how fluids and packets are integrated in [10]. Suppose at time
t and t + h the fluid model calculates the network queue length q(t) and q(t + h), respectively.
The problem occurs when packets arrive during the Runge-Kutta time interval between time t and
t + h, the fluid queue length at time t + h is not yet available. Figure 5.5a shows the situation
where a packet which is supposed to be admitted is dropped due to queue overflow. Figure 5.5b
shows a packet is admitted when it is supposed to be dropped. The Runge-Kutta step size must
be kept small to maintain numerical accuracy. Similar situation occurs when the queue length gets
close to zero were clever ad-hoc mechanisms are used to correctly schedule packet delays. Linear
interpolation of the values at the Runge-Kutta steps t − h and t allows to reduce approximation
error.
In QSS methods, all the problems related to discontinuities disappear. While in discrete-time
methods discontinuities are challenging to handle because trajectories are approximated by func-
5.2. Preliminaries and Related Work 105

Figure 5.4: Event skipping in discrete–time algorithms [165]

(a) The third packet is supposed to be admitted (b) The third packet is supposed to be dropped
but is dropped but is admitted

Figure 5.5: Mixing of fluids and packets. Approximation errors due to queue overflow[10]

tions that do not exhibit discontinuities (polynomials of Taylor-Series), in QSS the discontinuities
in the input and quantized variables are in fact responsible for time advance [5].
On one hand, QSS integrators provide dense piecewise constant, lineal or parabolic output for
state variables according the the order of the method. This allows to analytically calculate the
instant at which a state variable will trigger a discontinuity condition. The analytical calculation
106 Chapter 5. Fluid-Flow Network Simulation

implies solving the root of a polynomial (fast) and the method guarantees that the value will always
stay within the QSS error boundaries. On the other hand, the arrival of a discontinuous event is
handled naturally and without requiring a restart. In QSS, discontinuities occur regularly even in
systems without discontinuities as input trajectories qi (t) are discontinuous. Thus, a discontinuous
condition in QSS has the same effect and is handled just like a normal integration step [165].
Figure 5.6 shows the QSS simulation results for an enhanced bouncing ball problem that con-
siders ball moves in two directions (x and y), bouncing down a stairway and with the addition of
air friction as described in [5].

Figure 5.6: A ball bouncing down stairs using QSS2 [5]

5.3 Modeling a Fluid Buffer-Server System


A central component in network simulation is the buffer-server model, which can represent different
FIFO queues across generic network topologies. Queues are found in routers, switches as well as in
server NICs. Many different buffering techniques exist (e.g. shared buffers, priority queues, etc.)
but they all rely on a basic tail-drop behavior which does not accept new incoming elements once
a maximum capacity is reached.
From a fluid dynamics point of view, a buffer-server system can be seen as a finite reservoir
which gets filled when the incoming flows are bigger than the outgoing ones. For a buffer-server
system we are also particularly interested in distinguishing multiple simultaneous flows contributing
to the shared buffer size, and each will have an individualized output and discard rates.
5.3. Modeling a Fluid Buffer-Server System 107

We propose the following set of differential equations to represent a network buffer-system with
finite capacity and multiple input flows:
 

 
 0 < q(t) < Qmax
or

 


 
 P
q(t) = 0 and ai (t) − C > 0

 P 
 ai (t) − C, if


dq(t) i
i
= or (5.5)


dt

 P
q(t) = Qmax and ai (t) − C < 0

 




 i




0, otherwise

(
0 q(t) < Qmax
µi (t) = Pai (t) (
P (5.6)
j aj (t) − C) q(t) = Qmax
j aj (t)
(
ai (t) q(t) = 0
di (t) = Pai (t−τ ) C
(5.7)
q(t) > 0
j aj (t−τ )

q(t − τ (t))
τ (t) = (5.8)
C
where:
• Qmax and C are parameters that define the maximum buffer size and output capacity
of the buffer-server system.
• q(t) represents the buffer (queue) size at time t.
Buffer size will grow/shrink according to the difference between the output capacity C and
the sum of all input rates as shown by Equation (5.5). When the buffer size reaches its
limits (0 or Qmax ) the growth rate is 0 to guarantee limits are not exceeded. More on these
conditions is discussed later.
• a1 (t), ..., an (t) represent the arrival (input) rates at time t of flows 1, ..., n respectively.
Input rates are assumed arbitrary functions here. We shall see later that in the context of a
network topology input rates are defined by server rates or departures rates of neighbouring
queues.
• µ1 (t), ..., µn (t) represent the drop (discard) rates at time t of flows 1, ..., n respectively.
Packet discards occur only when the buffer is full (q = Qmax) and its rate equals the
difference between the service capacity C and the sum of all input rates. The total discard
rate is shared among all flows proportionally to their current input rate as shown by Equation
(5.6).
• d1 (t), ...,dn (t) represent the departure (output) rates at time t of flows 1, ..., n respectively.
When the buffer is empty (q = 0) there is no queuing wait time so the departure rate of each
flow will equal its arrival rate. When the buffer is not empty (q > 0), the router sends at
full capacity sharing the bandwidth among incoming flows proportional to their input rates
as shown by Equation (5.7). Incoming flows are delayed before departing according to the
experienced queuing waiting time described by τ (t).
108 Chapter 5. Fluid-Flow Network Simulation

• τ (t) is the experienced queueing delay, i.e. the waiting time that has been experienced by a
packet departing now, at time t.
Note that τ (t) is different from the current queuing delay, i.e. the waiting time that will be
experienced by a packet arriving now, at time t, and that responds to q(t)
C
.

These equations present certain characteristics that make its numerical simulation challenging and
that will be addressed in the following subsections, namely:

• Equations (5.5), (5.6), and (5.7) exhibit sharp discontinuities in the evolution of the differen-
tial equation when the buffer occupancy reaches certain limits. Depending on the simulation
method important considerations must be made to handle efficiently the switching conditions
(see Section 5.2.4). Section 5.4.4 discusses the approach taken in this Thesis.

• Equation (5.5) is an ODE and Equations (5.7) and (5.8) use retarded dynamics, i.e. depend
on time delays. Yet, in Section 5.3.1 we show that the system of equations is not characterized
as a typical Delay Differential Equation.

• In Equation (5.8) we find an implicit form for the delay τ (t). That is, the value of τ (t) is
expressed in terms of τ (t) itself. Section 5.4 describes different approaches for this particular
kind of dynamics.

5.3.1 Mathematical Characterization

If we look at Eq.(5.5) it is a classic ODE with no particular characteristics other than the boundary
conditions that must be observed when the queue is either empty or full. Let us shift our attention
for a moment only to the ca ses when 0 < q(t) < Qmax in order to remove the intricacies of
handling discrete events
P within the simulation of a continuous system. We are then left with the
dq(t)
simpler form dt = ai (t) − C which can be straightforwardly solved with QSS methods.
i

But we are also particularly interested in determining the behavior of the departure rates di (t).
When q(t) > 0 we have that di (t) depends on q(t − τ (t)) as in Eq.(5.7) which is a delayed version
of the state variable q(t) in Eq.(5.5).
Is this system a Delay Differential Equation (DDE)?

Let us work out Equations (5.5)-(5.8) in the following way:


5.4. Numerical Solving of Retarded FDEs with Implicit Delays 109

dq(t) X
= ai (t) − C (5.9)
dt i
dq(t) X X
= ai (t) − dj (t) (5.10)
dt i j
dq(t) X X aj (t − τ (t))
= ai (t) − P C (5.11)
dt i j
a k (t − τ (t))
k
dq(t) X X aj (t − q(t − τ (t)))
= ai (t) − P C (5.12)
dt i j
ak (t − q(t − τ (t)))
k
dq(t) X C X
= ai (t) − P aj (t − q(t − τ (t))) (5.13)
dt i
ak (t − q(t − τ (t))) j
k

We can rewrite Eq.(5.13) as:

dq(t)
= f (t, p1 (t − q(t − τ )), ..., pi (t − q(t − τ ))) (5.14)
dt
where
ai (t)
pi (t) = P , (5.15)
aj (t)
j
X X
f (t) = ai (t) − C pi (t − q(t − τ )) (5.16)
i i

Eq.(5.14) appears similar to a DDE in the form of Eq.(5.1). Yet, it does not fall in any of the
known DDE flavors presented above.
The delayed components pi (·) are not directly the variable being differentiated (as in Eq.(5.1))
but are functions that use the state variable q(·) to represent delays. I.e., the differentiated variable
appears affecting the delay itself. Moreover, q(·) appears also retarded in the form q(t − τ ). But
again, as q(·) shows up only modulating the delay of a function, it makes Eq.(5.14) a different kind
of mathematical object other than a DDE.
The correct characterization of this buffer-server system, to the best of our knowledge, is that
of a Functional Differential Equation (FDE) [166], in particular of the Retarded type (RFDE).
A direct implication of this assessment is that existing DQSS methods are, a priori, not the
appropriate tool to solve numerically the RFDE system (5.5)-(5.8) that in turn includes an implicit
formulation for the delays in (5.8).

5.4 Numerical Solving of Retarded FDEs with Implicit De-


lays
We address now the implicit nature of the delay τ (t) as defined in Equations (5.7)-(5.8) of the
fluid buffer-server system, as well as key aspects for its numerical approximation. We repeat the
110 Chapter 5. Fluid-Flow Network Simulation

equations here for convenience:


(
ai (t) q(t) = 0
di (t) = Pai (t−τ (t)) C
(5.7)
q(t) > 0
j aj (t−τ (t))

q(t − τ (t))
τ (t) = (5.8)
C
Incoming packets for the i-th flow are represented at time t by their arrival rates ai (t). They
will be served after a (dynamically changing) queuing time τ (t). Eq. (5.8) shows the implicit
definition where τ (t) depends on τ (t) itself.
Solving for this dynamic delay in buffer-server systems is challenging, and can greatly influence
the departure rates di (t) we are interested in, as will be shown in Section 5.5.5. In remarkable
studies such as [44] this delay is simply not considered.
In the following, we will address the solution of the RFDE system with implicit delays in the
context of the QSS theory, resorting to two different approaches:
1. In Section 5.4.1 we shall derive analytically an ODE system that is equivalent to the original
RFDE but without an implicit form for the delays. The resulting system can then solved
using standard QSS blocks for ODEs and DDEs developed in [8] and [80]
2. In Section 5.4.2 we approach the problem from a different perspective. We develop a new
algorithm that is able to solve the implicit delay directly, by programming events forward in
time, yielding a new Forward DQSS method (FDQSS).
Finally in Section 5.4.3 we compare both approaches.

5.4.1 Transforming Delay Dynamics


One possible approach to deal with the implicit form of Equation (5.8) is to model the delay τ as
a dynamical system on its own right. Let us first define the following tools:
• Auxiliary function: u(t, τ ) = t − τ (t),
• Chain rule: f (g(x)) = f 0 (g(x)).g 0 (x),
df (x,y(x)) ∂f ∂f ∂y
• Partial derivatives rule: dx
= ∂x
+ ∂y ∂x
Starting from Eq.(5.8), we take the derivative on both sides and apply the tools listed above:

q(t − τ (t))
τ (t) = (5.8)
C
1
= .q(u(t, τ (t))) ← auxiliary function (5.17)
C
dτ (t) 1 dq(u(t, τ (t)))
= . (5.18)
dt C dt
1 dq(u(t, τ (t))) du(t, τ (t))
= . . ← chain rule (5.19)
C du dt
1 dq(u(t, τ (t))) ∂u ∂u dτ (t)
= . .( + ) ← partial derivative (5.20)
C du ∂t ∂τ dt
5.4. Numerical Solving of Retarded FDEs with Implicit Delays 111

Equation (5.20) holds for any form of buffer dynamics q(·). By substituting now the specific
buffer size (5.9) into (5.20) we get:

dτ (t) 1 X dτ (t)
= ( ai (u(t, τ (t))) − C).(1 − 1 ) (5.21)
dt C i dt
1 X dτ (t)
= ( ai (t − τ (t)) − C).(1 − ) (5.22)
C i dt
P
ai (t − τ (t)) − C
i
= P (5.23)
ai (t − τ (t))
i

Eq.(5.23) has the same general form as Equation (5.14). Again, the delayed variable ai (·) is not
the same variable being differentiated τ (·), while the latter still appears in the form of the delay
itself. Equation (5.23) is still a Retarded FDE. Yet, it doesn’t present any implicit form anymore.
We want now to apply the QSS family of methods to solve (5.23). According to the quantization
procedure described in Section 2.5 we would get the following quantized system.
Note: Let us momentarily use x(t) for a generalized state variable and qx (t) for its quantized
version, not to confuse it with q(t) used so far for the buffer (queue) size.

dx(t)
= f (t − x(t)) (5.24)
dt
dx(t)
≈ f˜(t − qx (t)) (5.25)
dt
Unfortunately, the QSS theory provides guarantees of convergence and stability only for systems
quantized as:

dx(t)
= f (t, x(t)) (5.26)
dt
dx(t)
≈ f˜(t, qx (t)) (5.27)
dt
where the quantization function responds to qx (t) = Q(x(t), ∆q) as shown in 2.5.
With the following theorem we prove that if a generalized aggregated arrival function a(t) is
globally Lipschitz at all times, then a QSS simulation of the RFDE in Eq.(5.23) provides guarantees
of numerical asymptotic convergence at any desired accuracy.
Theorem 2 Consider the following RFDE on the state variable x(t):

dx(t) a(t − x(t)) − C


= (5.28)
dt a(t − x(t))

and assume that a(t) is (globally) Lipschitz with a(t) ≥  > 0 ∀t and C > 0.
Then, the QSS approximation of (5.28) converges to the analytical solution, i.e., the global
error goes to zero when the quantum ∆Q goes to zero, for any initial condition x(t = 0).
112 Chapter 5. Fluid-Flow Network Simulation

Proof: See Appendix 7.1 

Relying on Theorem 2 we are now ready to use standard preexisting QSS and DQSS blocks in
a simulator such as PowerDEVS to solve Eq. (5.23) as shown in Figure 5.7.

Figure 5.7: PowerDEVS implementation to obtain the dynamic delay as defined by Eq. (5.23)

5.4.2 New Forward Delay QSS (FDQSS)


An alternative option is to reformulate Equations (5.7)-(5.8) in such a way that the delay is
expressed forward in time.
This is shown in Equations (5.29)-(5.30). This new form also removes the implicit definition
of the delay, which is a goal we also pursued in the previous section. A similar forward delay
approach was reported in [24] within the context of a Runge–Kutta discrete-time solver. This
Section describes a new method to solve generic forward delay equations in the context of QSS.
As we shall see, it will not be required to include a new state variables in the system as it was the
case with Eq.(5.18).
Our reformulated system with forward delay is the following:

(
ai (t) q(t) = 0
di (t + τ (t)) = P i (t) C
a (5.29)
q(t) > 0
j aj (t)

q(t)
τ (t) = (5.30)
C
Equations (5.7)-(5.8) in the original system express the current departure rate di (t) in terms
of the past arrival rates ai (t − τ (t)), which required the implicit form of τ (t). The set of Equations
(5.29) - (5.30) are equivalent, only that they express the future departure rate di (t + τ (t)) in terms
of the current arrival rates ai (t). This removes the implicit form for τ (t) but requires special
5.4. Numerical Solving of Retarded FDEs with Implicit Delays 113

handling of the delay forward in time t + τ (t).

Equation (5.30) can be trivially calculated from q(t). For solving Equation (5.29), a new QSS
method is developed to approximate forward delay equations with the general form:

d(t + τ (t)) = y(t) (5.31)

where y(t) and τ (t) are known input signals.


We shall call the new method Forward Delay QSS (FDQSS) which is developed next.
FDQSS should be defined in the context of QSS, and a DEVS representation for the FDQSS
block must be found in order to implement it in a DEVS simulator.
The first step towards this aim, described in Section 5.4.2.2, is to define mathematically the
operation of a generic Delay block (Figure 5.8, left). After that in Section 5.4.2.4, we shall provide
the DEVS equivalent model D (Figure 5.8, right) that implements the obtained mathematical
definition.

5.4.2.1 Definitions
Figure 5.8 shows a block diagram of the input and expected output signals. The output delayed
signal d(t) is computed based on the input signals y(t) and τ (t).

τ(t) τ(t) D
d(t+τ(t)) = y(t) d(t)
y(t) y(t)
Figure 5.8: New FDQSS block. The mathematical operation for the expected input and output
signals (left). DEVS equivalent block diagram model (right).

Congruent with QSS, the output delayed signal d(t) can be expressed as a sequence of polyno-
mial segments valid on adjacent time intervals given by:

d(t) = d0,l + d1,l (t − tdl ) + d2,l (t − tdl )2 + ... (5.32)


tdl ≤ t < tdl+1 (5.33)

where tdl is a sequence of time instants with tdl < tdl+1 , and d0,l , d1,l , d2,l , ... are the corresponding
sequences of polynomial coefficients for d(t). We need to compute the time sequences tdl and the
coefficient sequences d0,l , d1,l , d2,l , ....
Similarly, the input signals y(t) and τ (t) can be expressed as a sequence of polynomial segments:

y(t) = y0,k + y1,k (t − tyk ) + y2,k (t − tyk )2 + ... (5.34)


tyk ≤ t < tyk+1 (5.35)
τ (t) = τ0,j + τ1,j (t − tτj ) + τ2,j (t − tτj )2 + ... (5.36)
tτj ≤ t < tτj+1 (5.37)
114 Chapter 5. Fluid-Flow Network Simulation

where tyk is a sequence of time instants with tyk < tyk+1 , and y0,k , y1,k , y2,k , ... are the corresponding
sequences of polynomial coefficients for y(t). Similarly tτj and τ0,j , τ1,j , τ2,j , ... are the equivalent
counterparts for τ (t).
Note that under this representation, the polynomial coefficients can change their values only
at the beginning of each interval. E.g. by taking the case of y(t) intervals commence at instants
tyk , so the coefficients remain constant throughout any time interval t ∈ [tyk , tyk+1 ). Also, there is no
predefined relationship between the time instants tyk and tτj for input signals y(t) and τ (t), which
evolve independently. The resulting timing sequence tdl for the output signal d(t) will depend on
tyk and tτj in a way that we will describe next.

5.4.2.2 Analytic calculation of polynomial coefficients for the FDQSS method


We wish to compute the piecewise polynomial trajectories of d(t) such that d(t + τ (t)) = y(t),
where y(t) and τ (t) are also piecewise polynomial trajectories as defined before. To completely
define d(t) we need to compute:

• the sequence of time instants td0 , ..., tdl

• the sequences of polynomial coefficients d0,l , d1,l , d2,l , ... corresponding to each time instant in
the sequence above

'= +

'=

[ ]

[ ]

[ ]
'= +

Figure 5.9: FDQSS algorithm based on polynomial segments. Sequences of time instants and
polynomial coefficients
5.4. Numerical Solving of Retarded FDEs with Implicit Delays 115

For the calculation of the coefficients of d0,l , ..., dn,l , we shall take an arbitrary point in time t∗
where y and τ are in their k th and j th time instants respectively as shown in Figure 5.9. At time
t∗ + τ (t∗ ), d shall be in its lth time instant, and must verify that d(t∗ + τ (t∗ )) = y(t∗ ) by setting
the coefficients as follows. We shall omit hereafter the j, k, and l sub-indices of the corresponding
sequences to avoid notation clutter.

d(t + τ (t)) = y(t) (5.38)


d0 + d1 .(t + τ (t) − td ) + ... = y0 + y1 .(t − ty ) + ... (5.39)
τ d y
d0 + d1 .(t + τ0 + τ1 .(t − t ) + ... − t ) = y0 + y1 .(t − t ) + ... (5.40)

Removing all coefficients for orders higher than 1 (as would be done for QSS2) we obtain the
following expressions for the coefficients of d:

d0 =d1 (r1 tr − r0 + td ) − ty y1 + y0 (5.41)


y1
d1 = (5.42)
r1 + 1
Removing all coefficients of orders higher than 2 (as would be done for QSS3) results in an
expression that contains terms with orders 3 and 4. Removing terms with orders 3 and 4 has an
approximation error allows obtaining the following expressions for the coefficients of d in QSS3:

d0 =y0 − ty y1 − d2 r02 + td d1 + y2 (ty )2 − d2 r22 (tr )4 − d2 (td )2 +


r0 [2td d2 − d1 ]−
r1 [2td tr d2 − 2tr d2 r0 − (tr )2 d2 r12 − tr d1 ]+
r2 [2(tr )3 d2 r1 + 2td (tr )2 d2 − 2(tr )2 d2 r0 − (tr )2 d1 ] (5.43)

−n
d1 = (5.44)
2tr r2 − r1 − 1
with n =4t3r d2 r22 + 2tr d2 r12 + 2td d2 − 2d2 r0 +
2r1 [(td + tr )d2 − d2 r0 ]−
2r2 [3t2r d2 r1 − 2tr d2 r0 + (2td tr + t2r )d2 ] + 2ty y2 − y1

r2 y1 − (2(tr + ty )r2 − r1 − 1)y2


d2 = − (5.45)
8t3r r23 − r13 − 12(tr r1 + t2r )r22 −
2 3r12 + 6(tr r12 + 2tr r1 + tr )r2 − 3r1 − 1

Regarding the time instants td0 , ..., tdl (at which d(t) is updated) we observe that its coefficients
depend on all coefficients of y(t), τ (t) and also on the time instants of their changes (tyk and tτj ).
Thus, coefficients of d(t) will remain valid until either y(t) or τ (t) polynomials change at time t0
(i.e we let t0 = min(tyk+1 , tτj+1 )).
Thus, at time tdl+1 = t0 + τ (t0 ), d(t) shall update the coefficients of its (l + 1)th time instant as
calculated before.
116 Chapter 5. Fluid-Flow Network Simulation

Morever, if t00 , ..., t0m is the ordered sequence of time changes for both y(t) and τ (t), with
(t0i ∈ ty ∪ tτ , t0i < t0i+1 and m = k + j), we can define the sequence of time instants tdl as follows:

tdi =t0i + τ (t0i ) (5.46)

5.4.2.3 Conditions for the Time Instants of Polynomial Segments


For the sequence in Equation (5.46) to be valid, we must verify that tdi < tdi+1 is always true. Let
us define:

tdi < tdi+1 (5.47)


t0i + τ (t0i ) < t0i+1 + τ (t0i+1 ) (5.48)
t0i + τ0,j + τ1,j (t0i − tτj ) + ... < t0i+1 + τ0,j 0 + τ1,j 0 (t0i+1 − tτj0 ) + ... (5.49)
(5.50)

where j 0 can either be j 0 = j or j 0 = j + 1 if the time change corresponds to a time change in


y or τ , respectively.
If the time change corresponds to an update in y (j 0 = j, t0i+1 = tyk+1 ):

t0i + τ0,j + τ1,j (t0i − tτj ) + ... < t0i+1 + τ0,j + τ1,j (t0i+1 − tτj ) + ...
t0i − t0i+1 < τ1,j (t0i+1 − tτj ) + ... − τ1,j (t0i − tτj ) − ...
(for QSS2)
t0i − t0i+1 < τ1,j (t0i+1 − t0i )
−1 < τ1,j (5.51)
(for QSS3)
t0i − t0i+1 < τ1,j (t0i+1 − t0i ) + τ2,j ((t0i+1 − tτj )2 − (t0i − tτj )2 ) (5.52)

If the time change corresponds to an update in τ (j 0 = j + 1, t0i+1 = tτj+1 ):

t0i + τ0,j + τ1,j (t0i − tτj ) + ... < t0i+1 + τ0,j+1 + τ1,j+1 (tτj+1 − tτj+1 ) + ... (5.53)
t0i − t0i+1 < τ0,j+1 − τ0,j − τ1,j (t0i − tτj ) − ... (5.54)

5.4.2.4 DEVS-based Algorithm to Obtain d(t + τ (t)) = y(t)


Following the QSS event-based nature and polynomial-based representation of signals, the calcu-
lations shown before can be implemented as a DEVS atomic model.
Below is the specification of the DEVS atomic model functions that follow the analytic calcu-
lations of Equation (5.40). The simulation time advance function uses a queue to schedule output
events forward in time following Equation (5.46):

• Model State: The model state is composed by the following variables:

– q = {q0 , ..., qn } where qi ∈ {R, P}: sorted list of the output time instants tdl and the
corresponding polynomial dl
5.4. Numerical Solving of Retarded FDEs with Implicit Delays 117

– y ∈ {R, P}: lastly arrived polynomial for the input signal y with it corresponding arrival
time ty
– τ ∈ {R, P}: lastly arrived polynomial for the input signal τ with it corresponding arrival
time tτ
• Time Advance Function: returns infinite if q is empty. Otherwise, the next internal
transition is scheduled to occur at q0 .f irst (the time in the first element of the queue)
• Output Function: sends an output event with the polynomial of q0 which was already
calculated in the external transition
• Internal Transition Function:
– Advance1 the y and τ polynomials with the elapsed time
– Pop q0 from the queue (removes the polynomial that was just sent)
• External Transition Function: Upon receiving a new y or τ segment at time t:
– Update either y or τ polynomial with the newly arrived segment. Update its corre-
sponding arrival time setting either ty = t or tτ = t
– Advance1 elapsed time the polynomial y or τ which was not updated in the previous
step.
– Enqueue a new element qn in the queue q with the following values:
∗ Output time tdl : calculated as in Equation (5.46) using t0i = t (the current time)
and τ (t0i ) = τ0 (τ was just updated or advanced1 so we can use its first coefficient
directly)
∗ Output polynomial dl : calculated as in Equation (5.40) using the coefficients of y,
τ , and its corresponding arrival times ty and td .
Note: calculations can be greatly simplified when using the coefficients of the
advanced1 polynomials of y and τ . For example, d0 simplifies to d0 = y0 when
replacing td with its corresponding value td = tr + τ (tr ) = tr + r0 or td = ty + τ (ty )
according to Equation (5.46)
The atomic DEVS model just described was implemented as a new generic, standalone and
reusable FDQSS block in the PowerDEVS continuous systems library. Figure 5.10 compares ex-
isting QSS delay blocks with the new FDQSS block, all of which calculate dynamic signals by
applying delays in different ways:
• The Delay block computes: d(t) = y(t − D).
• The VariableDelay block computes: d(t) = y(t − τ (t)).
• The ForwardDelay block computes: d(t + τ (t)) = y(t),
where y(t) and τ (t) are input dynamic signals and D is a constant parameter.

1
By "advancing ∆t time a polynomial p(t) = p0 + p1 (t − tp ) + ... (which is based on time tp )" we refer to the
operation of updating the polynomial coefficients so as to get an equivalent polynomial p0 based on time tp + ∆t
(e.g. p00 = p0 + p1 ∆t)
118 Chapter 5. Fluid-Flow Network Simulation

(a) Fixed Delay QSS (b) Delay QSS (c) Forward Delay QSS

Figure 5.10: Comparison of QSS Delay Blocks available in the PowerDEVS continuous systems
library.

5.4.3 Experiments with the Buffer-Server System


Experiments are performed to compare the fluid buffer-server system numerical approximations
against equivalent packet-level queues. The two strategies presented above for solving the implicit
equation (5.7) are compared in terms of results and performance.
Figures 5.11a and 5.11b show the experiment setups using a fluid buffer-server system and
packet-level queue, respectively. The fluid model uses continuous functions as inputs to the buffer-
server system block which are setup differently in each experiment. The packet-level simulation
uses the UDP sender model and a tail-drop queue model presented in Chapter 4. UDP hosts
are configured with a constant packet size of 1Kb and an exponentially distributed inter-packet
generation time with a mean modulated by the throughput of the fluid sources. For plotting
individual packets as rates, the sum of all packet is gathered over a fixed time period of 10ms.

q(t)

a1
d1(t)
μ1(t)
UDP
Host1
d1(t)
μ1(t)

a2
Buffer-Server UDP
Host2
Packet Queue Bandwidth Delay

(a) Fluid buffer-server system setup (b) Packet-level queue setup

Figure 5.11: Experiment setups to compare the packet-level queue and the fluid buffer-server
system.

Figure 5.12 shows results for the first experiment were the buffer-server system is configured
with a capacity C = 100Kbps and a maximum buffer size of Qmax = 600Kb. The input flows
a1 (t), a2 (t) to the buffer-server system follow very simple piece-wise constant trajectories, as shown
in the top row. The second row shows the queue size q(t) calculated from Equation (5.5). The
third row shows the departure rates d1 and d2 calculated as per Equation (5.7), and the bottom
row shows the drop rate µ(t) as per Equation (5.6).
The effect of the delay in the departure rate can be observed. For example, at 17s the second
host stops, but the queue has a waitT ime = 600Kb/100Kbps = 6s so there are still packets from
the second host departing until 23s. Similarly, it is easy to observe how the queue size q(t) re-
spects the limits [0, 600Kb] thanks to the boundedIntegrator. At 14.5s the queue gets full and starts
generating drops proportional to the input rates. The fluid buffer-server system shows excellent
5.4. Numerical Solving of Retarded FDEs with Implicit Delays 119

concordance with the packet-by-packet simulation.

Figure 5.13 shows the results for the second experiment. More complex input patterns are
used to compare further each approach to solve Retarded FDEs and test the accuracy of the fluid
buffer-server against packet-level simulations with stochastic packet sizes. The buffer-server system
is configured as before and the input flows follow a linear a1 (t) and sinusoidal a2 (t) trajectories,
as shown in the top row. The second and third rows show the departure rates d1 and d2 , and
include results for modeling τ as a dynamical system and using the new FDQSS block. In this
case, packet-level UDP hosts use packet sizes following a exponential probabilistic distribution
with mean=1000Kb.
Both approaches to solve Retarded FDEs yield very similar results (with expected minimum
differences). By using stochastic packet sizes we confirm that the fluid model captures an averaged
behavior only, ignoring the variance of the stochastic process. We conclude that the buffer-server
system provides an excellent approximation to the packet-level queue even in the presence of input
noise.

Table 5.2 shows a comparison of both approaches to solve Retarded FDEs. The FDQSS blocks
provide a 10% of performance improvement over the dynamic delay calculation. Also, FDQSS
is easier to use as it requires no extra parameters. The implementation of the dynamic delay
calculation relies on an additional integrator (which requires choosing dqmin, dqrel, x0 , Qmax , and
Qmin parameters) along with a DQSS block (which requires an initial history and initial values
parameters). Moreover, the FDQSS block can be reused in other models which use equations
in the form f (t + τ (t)) = y(t), whereas the dynamic delay calculation approach is specific for
the buffer-server system. Also, for the buffer-system in particular, the forward delay equations
provides a more intuitive way of understanding the delay: to schedule the currently incoming flow
to departure after the current queuing time.
On the other hand, the dynamic delay approach can be implemented with already developed
and tested blocks so it is in principle more reliable. Also, it implements the implicit equation as
originally written, although it requires to express the dynamics of the delay.
In the rest of the models included in this Thesis we will use the new Forward Delay unless
explicitly stated. The following table summarizes advantages and disadvantages of each approach:

Perfor- Parame- Modular


Model Reliability Equations
mance terization Block
Dynamic 10% dqmin, dqrel, Blocks since Yes Rewrite equation
Delay slower x0 for integra- 2015 for delay dynam-
(dτ (t)/dt) tor. History and ics
initial value for
dQSS block
Forward 10% No parameters re-
New block Yes Rewrite equation
Delay faster quired with forward de-
(FDQSS) lay
Table 5.2: Comparison of the Dynamic and Forward delay approached for FDEs.
120 Chapter 5. Fluid-Flow Network Simulation

Figure 5.12: Comparison of the fluid buffer-server system and packet-level simulations using step-
wise inputs and constant packet sizes.
5.4. Numerical Solving of Retarded FDEs with Implicit Delays 121

Figure 5.13: Comparison of FDQSS, dynamic calculation of τ and discrete packet simulation using
a buffer-server system
122 Chapter 5. Fluid-Flow Network Simulation

5.4.4 Numerical Simulation of Sharp Discontinuities: The QSS Bounded


Integrator
Regarding the sharp discontinuities, discrete-time numerical solvers must resort to iterative back-
ward calculation techniques [167] in order to accurately detect switching conditions (see Section
5.2.4). For example, in the MGT model no upper bound is considered so tail-drop behavior can
not be properly captured, and the lower bound is guaranteed by an ad hoc condition −1q(t) .
We developed a new QSS Bounded Integrator (QBI) where the detection of discontinuities is
straightforward and efficient due to the inherent discrete-event nature of QSS the methods [78]. The
QBI detects switching conditions to halt integration and guarantee that the resulting integrated
value remains within a maximum Qmax and minimum Qmin values.
QBI was added into the PowerDEVS’ hybrid library as a reusable block for generic purposes
(e.g.: to model a generic finite reservoir), as shown in Figure 5.14. For network modeling, the QBI
is used to model a buffer-server system with limited capacity and a TCP host where the congestion
window has maximum and minimum values. The QBI block interface, shown in Figure 5.14a, is
the same as in the classic QSS integrator so that both can be used interchangeably, while QBI
accepts 2 additional parameters for the maximum and minimum values.
Figure 5.14b shows the PowerDEVS implementation of the QBI block reusing the standard
QSS integrator (which receives q̇ and calculates q) and guarantees that the resulting integrated
value q remains within a maximum Qmax and minimum Qmin configured values. To keep the value
q within the boundaries, QBI sets the value of q̇ to a new value q˙b , which requires to detect the
switching of 2 conditions for each boundary (a total of 4 conditions):

1. When the integrated value q(t) would cross either boundary (saturation condition). Here
q˙b (t) is set to 0 to guarantee q(t) stays constant at the saturation value.

2. When the value of q̇(t) allows q(t) to leave the saturation condition. In this case q˙b (t) is set
to q̇(t) to continue with the standard QSS integration.

Formally, q˙b (t) within the QBI block is defined by the following expression:

 

 q(t) = Qmax AND q̇(t) ≥ 0 Persistent HIGH SATURATION
 0, if OR


q˙b (q(t), q̇(t)) = q(t) = Qmin AND q̇(t) ≤ 0 Persistent LOW SATURATION





q̇(t), otherwise

(5.55)
where:

• q(t) is calculated by a standard QSS block, and is bounded with Qmin <= q(t) <= Qmax .

• q̇(t) is the signal to be integrated, received as an input by the QBI block

• q˙b (t) is the input to a standard QSS block


5.4. Numerical Solving of Retarded FDEs with Implicit Delays 123

q'(t) q(t)

QSS Bounded
Integrator (QBI)
(a) Modular QBI block (b) QBI internal overview. Reuses a standard QSS integrator and imple-
ments Equations (5.55)

Figure 5.14: QBI block available in the hybrid library implemented in PowerDEVS

Equation (5.55) is implemented within the q_der_b coupled model in figure 5.14. For the
detection of switching conditions it relies on a Comparator and a Switch standard QSS blocks.
Figure 5.15 shows illustrative examples comparing the behavior of a standard QSS integrator,
a QBI integrator, and an integrator with capped output at saturation levels. In 5.15b the QSS
integrator imposes no boundaries and performs the usual integration throughout the simulation. In
5.15c the QBI exactly detects boundary conditions, halts integration when boundaries are reached,
and releases properly from saturation when the derivative signal changes. Meanwhile, the third
option in 5.15d caps output at saturation levels. Yet, this is not an option, as the internal state
variable continues integrating the input signal and therefore it takes some extra time to leave the
upper saturation level, which is not the expected behavior (see values at t = 3s or t = 5s).
124 Chapter 5. Fluid-Flow Network Simulation

600
400 Input
200

Value f'
0
200
400
6000 2 4 6 8 10
600
400 QSS Integrator
200

Value f
0
200
400
6000 2 4 6 8 10
Time [s]
(a) Integrator Test Model (b) Standard QSS integrator

600 600
400 Input 400 Input
200

Value f'
Value f'

200
0 0
200 200
400 400
6000 2 4 6 8 10 6000 2 4 6 8 10
300 300
250 QBI Integrator 250 Integrator+Saturation
200 200
Value f

Value f

150 150
100 100
50 50
00 2 4 6 8 10 00 2 4 6 8 10
Time [s] Time [s]
(c) QSS Bounded Integrator (d) Standard QSS Integrator with a capped out-
put

Figure 5.15: Example comparing the available QSS integrators. Boundaries for (c) and (d) are set
to Qmin = 0 and Qmax = 250.

5.5 A Modular Approach for Fluid-Flow Network Modeling


This section introduces a new library of fluid-flow models that provides the basic building blocks to
compose fluid network topologies. Basic Low-Level blocks contain the implementation of a subset
of the required ODEs. Similar to the Packet-Level approach, basic blocks are coupled together
to describe high-level models, which in turn represent top-level elements of a network topology.
The interconnection of said High-Level models define the final overall set of ODEs that govern the
dynamics of the fluid network.
The proposed approach decouples three domains of knowledge required to produce fluid sim-
ulations: a) the description of the network (network modeling), b) the implementation of ODEs
(ODE modeling) and c) ODE numerical solving (ODE simulation). Each domain requires specific
knowledge and should be ideally performed by different experts. In traditional approaches these
aspects are tightly coupled requiring network experts to acquire knowledge in continuous dynamic
systems, while ODE experts are required to code network topologies and numerical solvers. Con-
versely, with the technology presented in this work, network experts can specify fluid network
5.5. A Modular Approach for Fluid-Flow Network Modeling 125

topologies without knowledge on ODEs, using fluid blocks from a pre-defined fluid-flow library.
Equations for the continuous dynamics get defined with the PowerDEVS block-based diagrams,
which can be tested independently from other network elements and are easily interchangeable.
Meanwhile, ODE experts can focus on better expressing mathematically network dynamics in the
form of their fluid limits, without concerning on issues such as topological information, simulation
techniques or numerical solvers. The latter is handled transparently by the QSS methods and
the DEVS simulation engine, being able to switch between available solvers without changing the
equation’s block diagrams.

5.5.1 Introducing Fluid Entities for Simplified Fluid Modeling


In the context of DEVS blocks and QSS signals, we often need to describe and communicate sets
of attributes that are related to each other and/or are part of a same abstract entity.
Classically, each attribute needs to be communicated among models using a separated DEVS
port. In large models with lots of such entities, there is an explosion in the number of required
ports which can complicate the modeling tasks considerably (e.g. hundreds or thousands of "wires"
connecting too many ports render useless the visual representation of models).
We developed new models to bundle together multiple QSS signals and their properties into a
single Fluid Entity, which can be seen as a fluid multiplexing. This approach drastically reduces
the number of ports required allowing models to exchange full entities instead of individual signals
and attributes.
Fluid Entities represent a contribution in itself, useful for general continuous systems modeling
with DEVS and QSS. Though, they will be explained here in the context of fluid-flow based network
modeling.
We consider a "flow" as a continuum or stream of data that share some common characteristics:
traverse a same route and experience common round trip and propagation delays. Each flow can
then be described by the attributes listed below, and then encapsulated in a Flow type of Fluid
Entity (see Figure 5.16). This allows models to receive, manipulate and output Fluid Entities
instead of multiple independent QSS signals. Moreover, a Fluid Entity provides a common data
structure hierarchy for both fluid-flow and packet-level models.

• Flow ID: unique identifier for streams of data that share common characteristics. This
field is also present in the packet-level data structure as allows common behaviour such as
flow-based routing. Noted as a subscript of other attributes (e.g. d1 (t) denotes the departure
rate of Flow ID=1)

• Rate: amount of data transmitted per unit of time. Noted as ai (t) or di (t) for input (arrival)
and departure rates, respectively.

• Accumulated Drop Rate: total drops per unit of time experienced by the flow along its
route. Noted as µi (t).

• Accumulated Delay: total delay experienced by the flow along its route. Noted as τi (t).

The units for parameters and input/output signals are a choice of the modeler and needs to
be consistent. E.g. they can be measured in packets and seconds, so that rates (ai , di , µi and C)
126 Chapter 5. Fluid-Flow Network Simulation

will be measured in packet/s, while q, Qmax and Qmin will be measured in packets. Other typical
alternative is bit/s.

To enable the creation, transmission and manipulation of attributes of Entities, the following
models were added to the PowerDEVS library:

• Entity class hierarchy: it is depicted in Figure 5.16. The IEntity abstract interface
generalizes all possible entities while the IAttribute interface represents attributes. Concrete
entities (such as a FluidFlow ) combine related attributes (fields and QSS signals).

• Entity Creator block: This DEVS atomic model shown in Figure 5.17a has as many input
ports as attributes exist for an Entity. The Entity Creator groups together all incoming
signals and outputs DEVS events containing the new Entity. The lifetime of Entity objects
is related to the validity of its attributes. Every time a new input is received (e.g. an update
of a QSS polynomial) a new instance of the Entity is created with the corresponding attribute
marked as updated.
In other words, each new instance of an entity represents a discrete event that updates at
least one of the Entity’s attributes (very similar to the concept of updating the polynomial
coefficients for a QSS signal, i.e. a QSS update event).
Two Entity objects with the same ID might have different attribute values at different points
in a large model topology. For example, a FluidFlow entity for "flow1" Accumulated De-
lay=0, but at the end of its route it will have experienced some Accumulated Delay.

• Attribute Selector block: Attribute selector atomic models shown in Figure 5.17b receives
an Entity and outputs only a selected attribute. This allows for the manipulation of indi-
vidual attributes with any standard DEVS block. An Attribute selector model will generate
a new output event only if the entity has its corresponding attribute marked as "updated".
This prevents from generating redundant, useless events when a new Entity arrives but its
attribute of interest remains the same as in its predecessor Entity.

• Attribute Composer block: An attribute composer atomic model shown in Figure 5.17b
performs the inverse operation of the attribute selector: it receives QSS signals and sends
Fluid Entities with the corresponding attribute marked as updated. The creator groups
together all signals and sends DEVS events containing the new Entity.

In data network models Fluid Entities improve modeling by allowing network topologies to be
represented in a similar fashion as packet-level topologies, much like modeling a pipeline network.
Instead of using one DEVS port per signal (rate, drop rate, delay, etc.) top level components
(e.g. hosts, routers) connect with each other using a single DEVS port wherever a real connection
would exist in the real world. Moreover, it enables the routing of fluid flows throughout the network
topology based on Fluid Entity attributes (such as the ID) and operations can be performed as
they traverse a non-predefined route (e.g. each queue adds its own delay, to obtain finally a fluid
flow round-trip time measurement).
5.5. A Modular Approach for Fluid-Flow Network Modeling 127

IEntity IEntity

Attribute<string> id Attribute<string> id

Extends

Extends IFlow:IEntity

Attribute<string> flowId; // flow


Entity Attribute<int> typeOfService; // QoS

0..n
Extends Extends
Relation

Packet:IFlow FluidFlow:IFlow
IAttribute<T>
1
uint length_bits;  // QSS signals
bool updated std::shared_ptr<void> payload; Attribute<double*> rate;
T value ...... Attribute<double*> acumDrops;
Attribute<double*> acumDelay;

(a) IEntity class hierarchy (b) IFlow hierarchy

Figure 5.16: Class Hierarchy for Fluid-Flow Entities. The Packet class, same as defined in Figure
4.8, shares same data structure hierarchy with Packet-Level models.

Attribute Attribute
FluidFlow Selectors Manipulation
FluidFlow
rate (bps)
FluidFlow

(a) Entity Creator (b) Attribute Selector and Entity Composer

Figure 5.17: Models enabling Fluid Entities

5.5.2 Basic Low-Level Components encapsulating ODEs


Low-Level fluid models are the building blocks that encapsulate the ODEs of specific network
components. These are generally DEVS coupled models that use standard QSS blocks. The newly
developed Low-Level fluid model library includes models to represent basic data sources, queuing
128 Chapter 5. Fluid-Flow Network Simulation

mechanisms, communication links and multiplexing. These models receive, manipulate and output
Fluid Entities but will be described in terms of the flow attributes to simplify the notation.

5.5.2.1 Fluid Data Network Sources: TCP Window and Unresponsive Sources
Data flow sources generate the input into the network by creating flow entities with a dynamic
rate di (t). These can be modeled as simple continuous functions which are independent of external
inputs to represent unresponsive network protocols, or more complex functions that depend on the
state of the network (such as round-trip time or drops) to represent congestion control protocols.
Basic unresponsive sources, shown in Figure 5.18a, set the flow rate independently of external
signals. Basic sources might be useful in early stages of network design when traffic pattern
specifications are vague. Hosts’ traffic can be represented with simple continuous functions, such
as step-wise constant rates (to represent UDP hosts), oscillatory or trapezoidal (for cyclic data
rates over longer time periods), or other customly shaped functions of time. They can be easily
replaced afterwards by more complex blocks. Some examples are given in Section 5.5.5. This type
of traffic shaping is not considered in the MGT study.
Regarding congestion control protocols, sources with throughput controlled by the TCP pro-
tocol (Reno flavor) are considered. The TCP Window source is described in Table 5.4 and shown
in Figure 5.18b. It sets the flow rate di (t) according to a dynamic equation that depends on its
flow attributes after traversing the network. A TCP model for bulk transfer (with unlimited data
to send by the host) is modeled after the equations proposed in MGT:

W (t) ∗ N
di (t) = (5.56)
τi (t)
dWi (t) 1 Wi (t) µi
= − (t − τi (t)) (1 ≤ Wi (t) ≤ Wmax ) (5.57)
dt τi (t) 2 aN

The throughput di is calculated based on the TCP window size W , the round trip time τi , and
the total number of TCP sessions N . a is a parameter included in [24] which calibrates the loss
probability to compensate for the fact that TCP window size and perceived loss are not indepen-
dent, as assumed by this mean-value type of model. Note that here τi and µi are, respectively, the
accumulation of delay (round-trip time, RTT) and discards that the flow experienced after travers-
ing each of the queues and links in its path. The TCP window size models additive increase in the
first term, increasing by 1 every round trip time, and multiplicative decrease in the second term,
halving the window according to the discard rate. TCP senses packet losses after approximately 1
RTT, so the discard rate is considered with a dynamic τi (t) delay transforming the equation into a
DDE (see Section 5.2.3) with the same considerations as discussed earlier for Equation (5.7). Also,
the sharp discontinuities introduced by the maximum and minimum size of the window deserve
the same considerations as discussed earlier for Equation (5.5).
5.5. A Modular Approach for Fluid-Flow Network Modeling 129

Attribute/
Name Description
Parameter
Input Accum. De- τi (t) Delay experienced by the flow to traverse the
Attributes lay network and back (RTT)
Accum. µi (t) Total drop rate experienced by the flow after
Drop Rate traversing the network and back.
Flow Id i A unique ID is set for each flow.
Modified Departure di (t) Send rate as defined by (5.56)
Attributes Rate
Accum. µi (t) Flows are created with 0 drop rate at the
Drop Rate source.
Accum. De- taui (t) Flows are created with 0 delay rate at the
lay source.
Config. Maximum Wmax Maximum allowed TCP Window Size, usu-
Parameters Window ally 65535 bits
#Sessions N Number of TCP sessions within the host
Table 5.4: Block interface for the TCP host

Flow<di(t)>

UDP TCP
Flow<τi(t), μi(t)> Flow<di(t)>

UDP Source TCP Source


(a) Basic Data Source (b) TCP Data Source

Figure 5.18: Fluid-Flow Data Source blocks in the Low-Level Library

5.5.2.2 Buffering Mechanisms: Tail-Drop and RED Queues


The tail-drop and RED queue models implement different buffering policies based on the Buffer-
Server system set of ODEs (5.5)-(5.8) explained above. Queue blocks use and modify flow attributes
as described in Table 5.6.
The tail-drop queue, shown in Figure 5.20a, implements Equations (5.5)-(5.8) using Fluid Entity
models to manipulate flow attributes. The RED queue block is implemented as a new separate
block, reusing the behavior of the tail-drop queue as shown in Figure 5.19.. This provides a realistic
fluid representation of RED-enabled buffers which models both probabilistic discards (based on
the RED algorithm) and tail-drop discards (when the buffer gets full). The RED queue block
connects the tail-drop block with RED behavior as defined in Equations (5.58)-(5.61). The RED
block, shown in Figure 5.20b, provides same input/output ports as the tail-drop queue (refer to
Table 5.6) allowing the modeler to easily switch between different AQM policies, and enabling
topologies with heterogeneous queues (not possible in MGT).
In the next equations we will denote with RED: and queue: the internal (DEVS atomic)
130 Chapter 5. Fluid-Flow Network Simulation

blocks that are part of the larger (DEVS coupled) RED queue block as depicted in Figure 5.19.

REDqueue
ai q
queue
μi
ai μi

q μi
RED

Figure 5.19: RED Queue DEVS coupled model composition

db
q (t) loge (1 − α) loge (1 − α)
= qb(t) − .queue:q(t) (5.58)
dt  δ δ
 0 0 ≤ qb(t) ≤ tmin
qb(t)−tmin
RED:µRED (t) = p tmin ≤ qb(t) ≤ tmin (5.59)
 tmax −tmin max
1 tmax ≤ qb(t)
queue:ai (t) = ai (t).(1 − RED:µRED (t)) (5.60)
µi (t) = queue:µi (t) + ai (t).RED:µRED (t) (5.61)

The discard probability is calculated based on 2 functions as specified in [108]. Equation


(5.58) models the exponentially weighted moving average qb of the instantaneous queue size q as a
differential equation. Equation (5.59) models the classic RED discard probability function µRED
using the estimated qb. Packets discarded by RED do not contribute to the buffer size (5.60) and
increase the total discard rate (5.61). In MGT, RED discards are not taken into account in the
buffer size calculation. In [10] the authors do consider tail-drop or RED discards, but not both.
The implementation of ODEs is decoupled from the underlying ODE simulation. ODEs are
described in a graphical block-oriented form while the simulation method to be used is a parameter
to be chosen before execution, and for each integrator. Figure 5.21 shows the PowerDEVS imple-
mentation of the Buffer-Server system equations (5.5)-(5.8) using standard QSS blocks. Figures
5.21a and 5.21c show the use of the newly developed QBI and Forward Delay blocks described in
Section 5.4.
5.5. A Modular Approach for Fluid-Flow Network Modeling 131

Attribute /
Name Description
Parameter
Input Arrival Rate ai (t) N signals with the input rate of each in-
Attributes coming flow denoted by the subindex i ∈
{1, ..., N }.
Accum. De- µi (t) Increments the flows’ delay with the queuing
Modified lay waiting time calculated as q(t)/C
Attributes Departure di (t) Sets the flows’ departure rate according to
Rate Equation (5.7)
Accum. µi (t) Increments the flows’ discard rate with the
Drops value calculated in (5.6)
Output Queue Size q(t) Evolution of the queue size as described by
Signals (5.5)
Config. Buffer Size Qmax , Qmin Upper and lower limits for q(t). Qmin <
Parameters Limits Qmax with Qmin , Qmax ∈ [−∞, +∞] where
typically Qmin = 0, Qmax = K
Capacity C Service capacity of the outgoing link of the
queue
Drop Prob. tmin , tmax , pmax In RED queues, definition for the drop prob-
Function ability function in as defined in (5.59)
(only RED)
Queue Size α ∈ [0, 1], δ ∈
In RED queues, weight and sampling rate
Estimate R for the exponentially weighted moving aver-
(only RED) age (5.58). δ canP
be statically set to 1/C or
dynamical to 1/ i ai (t)
Table 5.6: Block interface for the RED and tail-drop queue models.

Flow1..n Flow1..n

Flow1..n Flow1..n

q(t) q(t)

Tail-Drop Queue RED Queue


(a) Tail-drop queue (b) RED queue

Figure 5.20: Fluid-Flow Queue blocks. Send/Receive Fluid Entities and reuses the buffer-server
system blocks.
132 Chapter 5. Fluid-Flow Network Simulation

(a) Queue Size - Equation(5.5)

(b) Queue Discards - Equation (5.6)

(c) Queue Departure - Equation (5.7)

(d) TCP Congestion Window - Equation (5.57)

Figure 5.21: Block-oriented representation of equations implemented in PowerDEVS


5.5. A Modular Approach for Fluid-Flow Network Modeling 133

5.5.2.3 Multiplexing Fluid Entities: Fluid-Flow routing


We mentioned that all fluid-flow models use single DEVS port to deliver events containing an
Entity object from any flow (distinguished by the ID). Yet, a QSS block can only handle a single
input/output, so events must be first demultiplexed into single flows and then each flow’s attributes
are in turn demultiplexed with the attribute selector described above.
Three models were developed to multiplex Fluid Entities and allow the routing of fluid-flow
signals:

• Static Flow Demultiplexer: Demultiplexes incoming flows based on their IDs and static
configuration. The mapping between output port and flow ID is done at configuration time,
as shown in Figure 5.22

• Dynamic Flow Demultiplexer: Demultiplexes incoming flows using a different port for
each flow ID. The mapping between output port and flow ID can be changed at simulation
time, choosing port 0 for the first arriving ID, port 1 for the second arriving ID, and so on.

• Fluid-Flow Routing Table: Demultiplexes incoming flows based on a routing table taken
from configuration. The configuration maps Flow IDs with an outgoing port number for
each routing table block. This uses the flowId attribute (see Figure 5.16) allowing routing of
fluid-flow QSS signals to use the same mechanisms as used for flow-based routing of discrete
packets described in Chapter 4.

(a) Flow Demultiplexer Block (b) Static Flow Demultiplexer Parameters

Figure 5.22: Fluid-Flow demultiplexer

5.5.3 Modular Construction of Fluid-Flow Topologies


Following the bottom-up modeling approach and the same idea as for the Packet-Level models, the
low-level building blocks described earlier are composed to form more complex components. Top
level components, such as different types of routers or hosts, are built based on the buffer-server,
TCP/UDP sources and multiplexing models that encapsulate an underlying subset of ODEs. By
interconnecting these high-level models the final system of ODEs gets automatically defined.
134 Chapter 5. Fluid-Flow Network Simulation

5.5.3.1 High-Level Fluid-Flow Models


We consider DEVS coupled models that represent communication channels, Queues and Routers.
Physical cables interconnect two network nodes and impose bandwidth and propagation delays.
Only unidirectional links are considered although bi-directional links can be trivially modeled as
two unidirectional links. Different links can be modeled by combining queues and delays models
presented earlier. Figure 5.23a shows the EgressPort DEVS coupled model which is composed by
Queue and a StaticDelay models. The StaticDelay model represent the propagation delay p ≥ 0
which is constant and modifies the flow rate by delaying it according to di (t) = ai (t − p) (note that
this is not a DDE as the delay p does not change with time). The Queue model represents the
bandwidth capacity and its associated delay. It can be configured as finite with discard behaviour
to represent finite buffers of Network Interface Cards (NIC) or as an infinite ideal queue. Other
queuing mechanisms can be trivially represented by interchanging queue models as shown in 5.24.
Figure 5.23b shows how routers are represented using Routing Table and EgressPort submodels.
For RED-enabled routers egress ports are replanced for the corresponding queueing mechanism as
shown in Figure 5.24.
Similarly, host are composed of TCP or UDP data sources and an egress port. Intermediate
hosts act as the receiving ends the data. UDP Receivers are simply empy model that act as sinks.
TCP Receivers forward the flow delay and drop attributes required by the TCP sender. The flow
rate ai is not forwarded so as not to affect the queues in the return path, as shown in Figure 5.23c.
5.5. A Modular Approach for Fluid-Flow Network Modeling 135

Flow<ai(t)> Flow<di(t)>

EgressPort
(a) EgressPort DEVS coupled model composed by a Queue and StaticDelay submodels
to represent Network Interface Cards (NIC)

Flow1..n Flow1..n

Router

(b) Router DEVS coupled model composed by a Routing Table and EgressPort sub-
models

Flowi
ACK
TCPr
Flowi

TCP Receiver

(c) TCP Receiver DEVS coupled model sets the flows rate to 0

Figure 5.23: High-Level Fluid-Flow Models

5.5.4 Modeling of Fluid-Flow Topologies


Fluid-flow entities are created at the sources, and later transmitted and manipulated as they
traverse the different models. The path of different flows is defined by the routing tables within
the routers. The communication channels increase the flows’ delay and discard rate, while shaping
their departure rate.
The connection between top DEVS models, which encapsulate fluid-flow equations, define the
underlying complete system of ODEs to be approximated. For the modeler to define a topol-
ogy, the complexity of ODEs is hidden away as well as the numerical method involved in their
approximation.
Just like packet-Level topologies, fluid-flow high-level models are connected using standard
DEVS ports resembling real network ports. Using Figure 5.24 (center) shows a topology with 6
136 Chapter 5. Fluid-Flow Network Simulation

hosts and 3 Router/Switches defined in the PowerDevs user interface.

TCP

RED

ACK
TCP TCPr
R ED
ACK

TCP TCPr
ACK
TCPr
TCP

Figure 5.24: Simple Fluid-Flow topology using PowerDEVS network library. DEVS models en-
capsulate ODEs and topological connections define the final dynamic system.

We remark that in all cases the topologies look very similar, both in the fluid and the packet-
level models. The network modeler drags and drop (or code) fluid- or packet-based blocks, and
interconnects them to build full topologies. This is done within the same tool and requires no prior
knowledge about the internal implementations, thus decreasing the learning curve. Conversely, in
most existing fluid approaches, topology gets implicitly defined by the set of ODEs, hiding away
their graphical view. Also, pre-defined blocks provide different versions for network elements
(e.g. RED or tail-drop queues, UDP or TCP hosts, etc.), which can be easily replaced fostering
lightweight experimentation with different alternatives.
5.5. A Modular Approach for Fluid-Flow Network Modeling 137

TCP

<- ACK

RED
TCPSender1 TCPReceiver1
Z

R ED
RED
Router1 Router2 Router3

TCPSender2 TCPReceiver2 RED

RED

TCPSender3 TCPReceiver3
<- ACK

Figure 5.25: Packet-Level topology. Repeated from Chapter 4 to be compared with the Fluid-Flow
topology in Figure 5.24.

5.5.5 Experiments with Fluid-Flow Models


We propose 3 experiments of increasing complexity to evaluate the proposed approach in terms of
modeling benefits. We verify that simulations yield results qualitatively comparable to the litera-
ture on the MGT model. Fluid-flow models are compared against their packet-level counterpart,
both types implemented in PowerDEVS. Finally, we provide a performance analysis.

5.5.5.1 Experiment 1: Single TCP sessions


The first experiment demonstrates the TCP fluid host in a single RED queue scenario. The topol-
ogy for the fluid-flow and packet-level models are shown in Figures 5.26a and 5.26b, respectively.
Two different hosts with throughput controlled by TCP share a common bottleneck link. The
receiver host is configured with two different TCP sessions, one for each of the senders. Links are
modeled unidirectional, so for completeness the Router1R represents the queue in the return path.
The model is configured with 5Mbps links, 10ms propagation delay, and the following parame-
ters for the RED queue: 400Kb buffer, tmin = 300Kb, tmax = 400Kb, pmax = 0.1, and α = 1E − 3.
The first host begins with 2 TCP sessions (N = 2), then after 15s the second hosts starts another
TCP session (N = 1), and finally after 30s the first host closes all connections. The continuous
model is set to start at 1.3s to skip the initialization phase which is not modeled.
Figure 5.27 compares qualitatively the following metrics: TCP window size (in packet-level:
average over all sessions in a host), hosts’ throughputs, buffer sizes (effective and RED estimate),
and departure for each flow. As reported for the MGT model, the fluid-flow approximation is meant
to capture the averaged operation of the system. The plot shows that the ODE system produces
the same behavioral profile as the packet-level system. In both models, fluid- and packet-level, it
is observed that when the number of sessions change TCP adapts to share fairly the bottleneck
link. As expected, the packet-level simulation shows a bigger variance, but the fluid approximation
follows similar dynamics resembling the averaged behavior. There is a small phase shift between
138 Chapter 5. Fluid-Flow Network Simulation

TCP ACK TCP


R ED TCPr R ED UDPr
TCP TCP

R ED R ED
(a) Fluid-flow topology using TCP hosts (b) Packet-level topology using TCP hosts

Figure 5.26: Topologies for experiment 1.

the two results probably due to fast-retransmit/fast-recovery (FRFR) not being modeled in the
fluid TCP block. The fluid RED queue follows a similar path as the packet-level queue both for
effective and estimated buffer sizes (again with a phase shift).
This experiment verifies that the dynamics of the new fluid TCP and RED queue blocks
yield similar results to the reported for the MGT model. The fluid equations approximate the
mean values of common network metrics such as latency, buffer sizes, link utilization, etc. and
additionally follow similar dynamics as the packet-level metrics.

Figure 5.27: Comparison of packet-level and fluid simulations for Experiment 1

5.5.5.2 Experiment 2: Multiple TCP Sessions and Interconnected Queues


The second experiment demonstrates TCP hosts with several sessions traversing interconnected
queues. Figure 5.24 shows the fluid-flow topology composed by 3 TCP hosts and 2 RED queues.
5.5. A Modular Approach for Fluid-Flow Network Modeling 139

The packet-level topology, shown in 5.25, is analogous to the fluid topology. Here the queues in
return path are omitted for simplicity as they do not contribute with queuing delay.
Queues and links are configured as in the previous example. Traffic from the first host traverses
both queues, while traffic from host 2 and 3 only traverses queue 1 and 2 respectively. Simulation
starts with 40 TCP sessions per host (120 in total), after 10s sessions reduce to 10 per host, and
after 20s all 40 sessions per host restart.
Figure 5.28 compares the TCP windows and buffer sizes. Again, the fluid-flow model follows
the expected averaged behavior and exhibits dynamics similar to the packet-level model which
shows steeper peaks. With 120 TCP sessions, the averaged TCP window and buffer sizes stabilize
around the RED tmin , tmax values. When TCP sessions reduce to 30, RED is unable to control the
buffer size and TCP window size oscillate. Fluid equations are able to capture both transient and
stable phases dynamics.
This experiment shows the qualitative accuracy of TCP equations for a larger number of con-
nections. When using simple data patterns (such as UDP) the fluid queue yields very satisfactory
results. When modeling more complex flows (such as TCP), as reported for the MGT model,
the fluid model yields acceptable approximations. Fluid TCP captures relevant dynamics in send
patterns and key features such as fair link sharing. Depending on the case study, the sacrificed
accuracy is a price to pay for significant execution speedups. As the modular approach allows to
easily replace and test different implementations of a same element, a new TCP host that models
FRFR might yield better approximations.

Figure 5.28: Comparison of packet-level and fluid simulations metrics: 2 RED queues and 3 TCP
hosts with 120 sessions .
140 Chapter 5. Fluid-Flow Network Simulation

5.5.5.3 Experiment 3: Performance Scalability Analysis


Simulation performance scalability is analyzed for the packet-level and fluid-flow models. The
goal is to verify that our modular approach (using QSS solvers for the fluid case) exhibits similar
scalability results and performance advantages as reported by the MGT model (using discrete-time
solvers). The model in Section 5.5.5.2 is scaled up by a factor of K, increasing proportionally the
link bandwidth, buffer capacity, RED thresholds and number of TCP sessions. We set α inversely
proportional to K, and use 3rd order QSS with accuracy parameters according to link speeds.
We ran single threaded simulations for increasing values of K using PowerDEVS2.2 (Intel Corei7
3.40GHz, 8GB RAM). Figure 5.29 shows that simulation times for the packet-level models scale
linearly with link bandwidth (more packets required to saturate each link) while remaining almost
flat for fluid models. We verified speedups of up to 200x for 1Gbps links.

Figure 5.29: Packet-level and fluid-flow execution times for Experiment 3

5.6 Hybrid Network Simulation: Integrating Fluid-Flow and


Packet-Level models
In previous chapters we showed how packet-level and fluid-flow simulations can be performed
under a unified DEVS formalism and DEVS-based tool. In this Section the focus is made on how
these different simulation techniques can interact with each other. Compared with other hybrid
approaches, the one presented here
5.6. Hybrid Network Simulation: Integrating Fluid-Flow and Packet-Level models
141

• is based on a common formal discrete-event representation of dynamic systems

• does not require multiple overall "rounds" (the two-pass model [9] that requires a master
ad-hoc algorithm to scan the full model multiple times)

• Although smoothing can improve performance, it is not a mandatory requirement to match


dissimilar time representations of the packet-level and fluid-flow solvers [10].

To understand the hybrid approach we turn our attention back to the discussion about lev-
els of abstraction. We have stated that a packet-level model provides a detailed description of a
problem while the fluid-flow approach offers a higher-level, averaged, view. This is certainly true
at a "system" level but, what if we turned this view around only for certain system components ?
We could then profit from this flexibility to integrate fluid-flow and packet-level models. We shall
take a closer look at the representation of packets, links and queues.

Simulated packets can carry any fine-grained detail or information required by real-world net-
work protocols. Yet, as with any model, simulated packets are also abstractions of their real word
counterparts. A key abstraction we will benefit from is the behaviour of a packet when it is fed
into a communication channel. When a packet arrives at the incoming end of a link (assume a
half-duplex channel for the moment), the link model reads the packet size and schedules it to go
out after packetDelay = packetSize/linkBandwidth + linkP ropagationDelay.
In the real world things are different. A packet is just a set of bits, to which transmitters
(at NICs) apply complex encoding and/or modulation techniques to transform then into analog
signals.
For the hybrid approach we take an intermediate startegy. We represent packets as a sustained
stream of bits, and use continuous signals to represent a link’s ON/OFF state (ON: sending bits,
OFF: not sending). This way, discrete packet arrivals are transformed into piece-wise continuous
signals without loss of accuracy (as it would occur if we used smoothing techniques, e.g. like in
[9]). This approach relies heavily on the DEVS’ ability to represent both discrete events and QSS
continuous signals within the same formal framework. QSS (asynchronous) integration steps and
packet-level (stochastic) arrivals occur at independent time instants, and yet both can interact
seamlessly. This is possible due to the inherent discrete-event nature of QSS combined with its
dense, piece-wise polynomial representation of signals.
Similarly, a packet-level queue model receives packets and store them by increasing its buffer
size by the new packet size. The queue model receives the complete packet information (all of its
content) at one sharp instant in time, and the buffer jumps from having size q bits to q + packetsize
bits instantaneously (without intermediate states). This can certainly not be possible in real-world
queues which receive, store, and remove streams of bits on a bit-wise basis. This is what the fluid
buffer-server system represents.
Thus, we argue that the model for a packet-level queue does not necessarily imply a more
detailed representation when compared with the fluid-flow queue model. Section 5.5.5 showed
examples on how the fluid-flow queue yields almost exactly the same behaviour as the packet-level
queue when inputs are equivalent.
For the new hybrid approach only the fluid-flow queue representation is used instead of the
packet-level queue. Queue inputs are continuous signals representing bit streams coming either
from the hybrid link (packets transformed into continuous signals) or directly from fluid-flow
142 Chapter 5. Fluid-Flow Network Simulation

models. We reuse the same buffer-server and fluid-flow system models described in Sections 5.3
and 5.5, as well as the same packet-level models described in Chapter 4. This provides a seamless
integration with the hybrid link, while avoiding overflow and underflow issues present in other
hybrid approaches [10].

5.6.1 Data Structures for Hybrid Models


Hybrid models handle both discrete event packets as well as QSS signals contained in fluid-flow
entities. The HybridFlow data structure, shown in Figure 5.30, neatly represents messages car-
rying information about both types of structures. Just like the Packet and FluidFlow classes, the
HybridFlow class inherits from the IFlow interface. This class inheritance allows sharing common
flow-based behaviour (such as e.g. routing information) among packet-level, fluid-flow and hybrid
simulations. This also facilitates high-level modeling, as a same DEVS port can be used to handle
the three types of data structures.
To allow both packet-level and fluid-flow models to retain their original implementation without
the need of relying on the hybrid structure, the HybridFlow class contains attributes for packets
and their corresponding fluid flow. Using the Entity Attribute Selectors described next in Section
5.5.1 packets and fluid flows can be redirected individually into non-hybrid models.

IEntity

Attribute<string> id

Extends

IFlow:IEntity

Attribute<string> flowId; // flow


Attribute<int> typeOfService; // QoS

Extends

Packet:IFlow HybridFlow:IFlow FluidFlow:IFlow

uint length_bits;  std::shared_ptr<Packet> packet; // QSS signals


std::shared_ptr<void> payload; Use std::shared_ptr<FluidFlow> flow; Use Attribute<double*> rate;
...... ...... Attribute<double*> acumDrops;
Attribute<double*> acumDelay;

Figure 5.30: HybridFlow class within the complete network data structure hierarchy

5.6.2 Turning Discrete Packets Into Continuous Signals: The Hybrid


Link
Packet-level simulations abstract the real world at a "packet scale". Real world packets are not
transmitted as a whole but bit by bit. By unfolding this packet-level abstraction, packets could
be transformed back into a continuous flow of bits with communication channels.
5.6. Hybrid Network Simulation: Integrating Fluid-Flow and Packet-Level models
143

Bandwidth 150bps

bits/second 

Packet #1

Packet #2

Packet #3
(22 bits)

(33 bits)
Packets HybridFlow

(19 bits)
HybridLink time 

(a) Block diagram view (b) Example output generated with bandwidth C = 150bps

Figure 5.31: Hybrid link receiving discrete packets and generates fluid-flow continuous signals.

Figure 5.31a show a block diagram for a hybrid link. It represents a communication channel
receiving discrete packets and emitting hybrid flows with continuous signals. Discrete packets are
transmitted passing a link with bandwidth C and propagation delay D. The hybrid link replaces
the bandwidth-delay behaviour while the propagation delay is still modeled by the packet-level
model. The continuous fluid-flow signal generated can be directly used as an input for a fluid
buffer-server system.
From an algorithmic point of view the idea is very simple: every time a packet p arrives to the
communication channel it generates a continuous output signal d for the departure rate as follows:

• Upon receiving a discrete packet, d is set equal to the link capacity d = C.


p.size
• The rate d will be kept constant for a period of time equivalent to C

p.size
• After a time C
elapsed, set the rate back to zero d = 0

• The discrete packet is sent along with the continuos signal using the HybridFlow data struc-
ture.

Figure 5.31b shows a sample output generated by a hybrid link model configured with band-
width C = 150bps after receiving three packets of different sizes. The generated output value is
either 0 or 150 bps representing the ON/OFF state of the communication channel. When the first
22
packet of 22 bits is received, the ON state is set and maintained during 150 = 0.14666 seconds. The
33
OFF state is kept until the second packet of 33 bits arrives, setting the ON state during 150 = 0.22
seconds.

5.6.3 Hybrid Queue: Turning Continuous Buffer-Server Metrics into


Discrete Events
A new hybrid queue component is defined to merge fluid-flow and packet-level simulations. Figure
5.32 shows how this is implemented as a DEVS coupled model. From the interface point of view,
the hybrid queue receives and sends HybridFlow entities, as defined in 5.30
144 Chapter 5. Fluid-Flow Network Simulation

On the one hand, the hybrid queue strongly relies on the fluid buffer-server system. The
continuous part of hybrid flows is sent as input to the fluid queue on top of the usual fluid-flows.
In this sense, the buffering is simulated as a full fluid model, which in Section 5.4.3 was shown to
provide excellent accuracy.

HybridFlows HybridFlows

RED

Figure 5.32: Hybrid queue DEVS coupled model implemented in PowerDEVS

On the other hand, discrete packets are impacted by the fluid buffer-server queuing delay
and discard probability metrics. The HybridMerge DEVS atomic model shown in Figure 5.32
implements this behaviour. The current queuing delay q(t)/C is directly applied to discrete packets
by delaying the time they depart from the queue. Regarding discards, an arriving packet is
discarded with probability µaii(t)
(t)
taken from an uniform distribution.
It is important to note here the importance of QSS dense output. The fluid-flow model, as
approximated with QSS methods, produce output trajectories that describe the variables dynam-
ics at any point in time (not only at time-stepped integration instants). For hybrid models, this
implies that the value of buffer-server metrics (such as the drop rate µ(t)) can be accurately calcu-
lated (within the QSS guaranteed error bounds ∆Qrel , ∆Qmin ) at the exact instant when a discrete
packet arrives into the queue. This allows the QSS buffer-server integrator to produce outputs at
independent time instants, and they still can interact accurately and straightforwardly.

5.6.4 Hybrid Topologies


Hybrid topologies are constructed similarly as packet-level and fluid-flow topologies, i.e. by con-
necting high-level blocks. In this case, hybrid queues are placed instead of fluid/packet queues and
5.6. Hybrid Network Simulation: Integrating Fluid-Flow and Packet-Level models
145

hybrid links are used instead of bandwidth delays. For this purpose, the model library includes
hybrid egress ports as shown in Figure 5.33. At the receiver side, the corresponding attribute
selectors should be used to unwind the hybrid flow into the corresponding type (PacketSelector or
FluidSelector for the packet-level and fluid-flow receivers, respectively.
Also similarly to packet-level and fluid-flow topologies, using PowerDEVS provides an intuitive
graphical design interface, while tools such as Py2PDEVS or TopoGen provide algorithmic means
to generate automatically much larger, complex topologies too difficult to compose by hand.

Fluid-flow

ACK
TCP TCPr
ACK
TCP TCPr
Packet-level

TCP
HybridLink

Figure 5.33: Hybrid topology: a fluid-flow and a packet-level host sharing an hybrid bottleneck
queue

Note that here we chose to build hybrid topologies having every queue in the path accepting
both packet and fluid flows. Another approach that could be explored is to use the same com-
ponents already available to define a subnetwork to be fully packet-level, the rest of the network
fully fluid, and then have packet/fluid interaction only at hybrid edges (similar to what is done in
146 Chapter 5. Fluid-Flow Network Simulation

[9]). In this sense, packet flows could enter an hybrid queue at the edge of the network as they do
now, traverse a sequence of fluid queues, an then be applied accumulated fluid delay and discard
rates already available only at the other end of the network. This could be achieved with the
same HybridLink and HybridMerge atomic models, but coupling them differently (i.e. using the
HybridMerge only at the final edges of the fluid network).

5.6.5 Controlling Performance Through Smoothing Continuous Signals


Although it is not strictly necessary, smoothing continuous signals can greatly help to enhance
performance on hybrid models.
The simplest smoothing technique is to calculate averaged packet rate over a fixed time period.
This approach is indispensable in hybrid approaches were packet-level simulations need to interact
with classic discrete-time solvers (such as Runge Kutta) as it turns the discrete-event packet arrivals
into rates advertised at discrete-time instants (refer to [9] for a typical example). Smoothing packet
arrivals at a fixed rate has the potential to wipe out the impact of packet bursts, which is a very
much present and relevant phenomenon in real world networks.
We implement smoothing as a DEVS atomic model that calculates averaged packet rates over
a fixed time period S and generates changes only if the rate has changed in more than deltam in
bits per second. The smoother receives discrete packets and keeps the sum of all the sizes in
sum. Every S seconds it calculates the bit rate averaged over that period: rate ˆ = sum . It then
S
generates a continous signal with value yi = rateˆ if abs(yi−1 − rate)
ˆ < deltamin . Otherwise, it does
not generate any output signal as the difference in the averaged rate is considered too small to
introduce noticeable errors while it would contribute to noticeable performance reduction.
The hybrid model presented here is not restricted to use discrete-time smoothing (in fact it
could use no smoothing at all) as both the packet-level and fluid-flow models are represented with
a discrete-event formalism. A very simple time window smoothing is presented in this Thesis but
more complex algorithms could be implemented in the future. Other alternatives could account
for the rate of change (first derivative) or produce faster changes in the smooth rate when a burst
of packets arrives.

5.6.6 Experiments with the Hybrid Buffer-Server system


Some experiments are performed to test the hybrid approach in terms of accuracy and performance.
A topology with a packet-level TCP sender, a TCP fluid-flow sender, and a single bottleneck link
is used as shown in Figure 5.33. The packet-level TCP sender uses VectorialDEVS (depicted with
green borders) to create Np TCP sessions. Each TCP session is related to an application with
infinite data to send controlled by the TCP protocol and uses packets of stochastic sizes generated
from an exponential distribution with mean=4000bits. The fluid-flow sender uses the TCP fluid
Equation (5.56) to determine its throughput, also with a configurable number of sessions to repre-
sent Nf . The routing tables define the path taken by the fluid-flow and packet-level flows so that
they traverse a shared bottleneck link (between router1 and router2). At router2 each flow uses a
different port to be directed to the corresponding receiver. Propagation delay is set to 1ms in all
links and bandwidth changes in each experiment.
5.6. Hybrid Network Simulation: Integrating Fluid-Flow and Packet-Level models
147

5.6.6.1 Experiment 1
In the first experiment, the hybrid queue accuracy is tested when fed with packet-level only input.
Testing with fluid-flow only input is skipped as the hybrid queue is equivalent to the fluid-flow
buffer-server if there is no packet-level input. A packet-level and an hybrid router are config-
ured with a 600Kb buffer, 5Mbps bandwidth, 1ms delay, and RED parameters (tmax=400Kb,
tmin=300Kb, pmax=0.1, weight=0.001, samplingPeriod = 0.8ms (only hybrid)). Additionally,
QSS integrators in the fluid queue are configured with ∆Qmin = 1E − 3 and ∆Qrel = 1E − 6. The
packet-level host has a 10Mbps bandwidth, 1ms delay.
Figure 5.34 compares simulation results when the same packet-level input (shown in the top
row) is sent to the hybrid router and the packet-level router. The fluid-flow queue size (shown in
the second row) follows almost perfectly the packet level queue. It could be somehow surprising
that although the fluid buffer-server system tracks only an averaged behaviour, here it seems to
follow short-term variations closely. This is because of the rapid ON/OFF behaviour of the ar-
riving signals which triggers not so fast queue size changes according to the ∆Qmin and ∆Qrel
parameters. Packet discards (shown in the third row) are equivalent and at the time ranges when
the queue size activates RED discards. Packet discards are stochastic, so exact packet drop times
occur at slightly different moments in each router.

160
140
120
sender_packet CWND (packet)
CWND [pkts]

100
80
60
40
20
00 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
600000
500000 Queue Size (packet)
Queue size [b]

400000 Queue Size (fluid)


300000
200000
100000
00 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
25000
20000 Drops (packet)
Drops (hybrid)
Drops [b]

15000
10000
5000
00 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Time [s]
Figure 5.34: Comparison of hybrid and packet-level queues fed by the same packet-level flow.

5.6.6.2 Experiment 2
In the second experiment, a fluid-flow TCP session is added to test the hybrid queue accuracy
when packet-level and fluid-flows share the same link.
148 Chapter 5. Fluid-Flow Network Simulation

The fluid-flow host is configured as the packet-level host with 10Mbps bandwidth, 1ms delay,
and a single TCP session Nf = Np = 1. Figure 5.34 shows the simulation results where both
packet-level and fluid-flow TCP sessions share the same hybrid bottleneck queue. In each row the
plot shows the TCP congestion window sizes, the sending rate of each host, the queue size and
RED estimate of fluid-flow buffer-server within the hybrid router, and discrete packet discards
together with the fluid discard rate. The packet-level and fluid-flow TCP windows have similar
values and dynamics. Some differences are due to the stochastic behaviour in the packet-level flow
as it would happen in full packet-level simulations. For example at time t ∼ 7, RED increases the
discard rate but no actual packet discard takes place. As a consequence, the fluid window does fall
but not the discrete window. The second row shows that both TCP host adapt to the bottleneck
bandwidth sharing the 10Mbps link fairly, as expected.
120
CWND (packet)
CWND [pkts]

100
80
60 CWND (fluid)
40
20
00 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Queue size [b] Send Rate [bps]

5000000
4000000 sent rate (packet)
3000000
2000000
send rate (fluid)
1000000
00 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
450000
400000
350000
300000
Queue Size (fluid)
250000
200000 RED avg (fluid)
150000
100000
50000
00 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
45000
40000
35000 Drops (hybrid-fluid)
Drops [b]

30000
25000
20000 Drops (packet)
15000
10000
5000
00 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Time [s]

Figure 5.35: Hybrid simulation with fluid-flow and packet-level flows sharing the same hybrid
bottleneck link.

5.6.6.3 Experiment 3
The third experiment evaluates the hybrid queue performance in a background/foreground traffic
scenario. In this scenario there is a small fraction of the network traffic which is the center of
the study, and is modeled in great detail with the packet-level approach. This detailed traffic is
called the probe, or foreground flow. The rest of the traffic, called background flow, takes the
biggest share of the network bandwidth. The background flow is only interesting to the modeler in
terms of the effects it imposes on the foreground flow, and thus it is modeled using the fluid-flow
approach.
We evaluate the performance of the hybrid queue when increasing the amount of fluid-flow
traffic. The packet-level host is set with a fixed bandwidth of 1Mbps. Each execution defines K =
5.6. Hybrid Network Simulation: Integrating Fluid-Flow and Packet-Level models
149

[1, 2.5, 5, 10, 25, 50, 100, 250, 500, 1000, 2500, 5000] and increases the router, fluid-flow sender and
receiver bandwidth to KM bps. The router is set to use buf f er = K ∗ 800Kb, tmin = K ∗ 200Kb,
tmax = K ∗ 500Kb, pmax = 0.01, and weight = 0.001. An equivalent packet-level only topology is
used for comparison. The packet size is taken from an exponential distribution with mean 3000b.
100 seconds are simulated and the total throughput of both traffic flows is measured in a window
between 30s to 90s.
Figure 5.36a shows the hybrid simulation throughput of the background and foreground traffic
in each execution. When the bottleneck link is 1Mbps, the packet-level and fluid-flow TCP connec-
tions adapt to share fairly the link. As the router and background server bandwidth increase, the
foreground throughput remains always close to 1Mbps (limited by the server bandwidth) taking a
smaller share of the total throughput.
Figure 5.36b compares the hybrid and full packet-level simulation execution times. The packet-
level simulation performance is directly proportional to the number of packets that traverse the
network and increases linearly with the bottleneck link bandwidth. In the hybrid simulation
execution times are roughly constant and independent of the bottleneck bandwidth. The fluid-
flow throughput increases with the bottleneck link bandwidth but the amount of generated packets
is about the same. Fluid-flow simulation times are independent of the link bandwidth and thus
in hybrid simulations the performance is driven by the packet-level throughput share. On the
other hand, hybrid simulation performance is worst than the full packet-level simulation when the
bottleneck bandwidth is small and the fluid-flow share is equivalent to the packet-level share. This
is studied further in the next example.
150 Chapter 5. Fluid-Flow Network Simulation

(a) Throughput share between foreground and background traffic

(b) Execution times of the packet-level and hybrid models for 100s of simulation.

Figure 5.36: Hybrid simulation results for the foreground/background traffic experiment. Fore-
ground traffic is represented with packet-level models and background traffic with fluid-flow models.
The packet-level server bandwidth is set to 1Mbps.
5.6. Hybrid Network Simulation: Integrating Fluid-Flow and Packet-Level models
151

5.6.6.4 Experiment 4
The fourth experiment studies the accuracy and performance of the hybrid queue when interacting
with an increasing packet-level traffic. The bandwidth is fixed at 200Mps for the packet-level and
fluid-flow hosts, and the router is the bottleneck link with a bandwidth of 100Mbps. In every
execution there are 40 simultaneous TCP sessions, where the packet-level host is configured with
Np TCP sessions and the fluid-flow hosts with Nf = 40 − Np TCP sessions. The fluid-flow TCP
window is configured with a drop compensation parameter a = 2 (refer to Equation (5.57)) The
router is set to use buf f er = 5M b, tmin = 300Kb, tmax = 5M b, and pmax = 0.005. An
additional hybrid simulation uses a smoother configured with deltamin = 1 to disable it and a
window of 10ms as the propagation delay. An equivalent packet-level only topology (40 packet-
level TCP sessions and a packet-level router) is used for comparison with the same stochastic
packet sizes as in the previous experiment. 70 seconds are simulated and the total throughput of
both traffic flows is measured as from 10s.
Figure 5.37 shows the smooth hybrid simulation metrics when there are 20 packet-level and 20
fluid-flow TCP sessions. On the first plot, the 20 packet-level windows are averaged and a single
probe packet-level is included to verify the behaviour of detailed foreground traffic. The fluid-flow
congestion window approximates very closely the packet-level average. The plot on the second
row shows the throughput of both types of traffic interact to quickly share the 100Mbps link. The
fluid-flow rate shows a fast fluctuations regardless of the smoothing and mean-value foundations.
This is a consequence of the queue size, shown in the third row, which is affected by the hybridized
traffic which changes its rate according to the smoothing.
Figure 5.38a shows the total throughput of the packet-level and fluid-flow hosts for the smooth
hybrid model.The figure show different executions increasing the percentage of packet-level TCP
sessions Np . Results show that the percentage of packet-level throughput increases proportional
to the number of packet-level TCP sessions. This supports the fact that in the hybrid simulation
with smoothing both types of flow have similar behaviour and adapt to sharing the bottleneck
link. The hybrid simulation without smoothing yields very similar results. On one hand, it further
validates the fluid-flow model and additionally shows the ability of the hybrid queue to correctly
simulate the interaction between fluid-flow and packet-level traffic.
Figure 5.38b compares the execution times of the hybrid simulation with smoothing (in green)
and without smoothing (in blue) when packet-level TCP sessions Np increase. The full packet-
packet model execution time with 40 TCP sessions is also included for comparison (in red). In this
scenario, the hybrid simulation without smoothing quickly becomes slower than the full packer-
level simulation making it useless (slower and less detailed). A high level rate of discrete packets
generates a high number of ON/OFF state changes into the fluid-flow queue within the hybrid
router. The fluid-flow queue performs more operations (QSS approximations) than the packet-
level queue which just enqueues and dequeues packets. When percentage of packet-level traffic
is high compared to the fluid-flow traffic it is worth to smooth the ON/OFF state changes of
the hybridized flow. Figure 5.38b shows the performance advantage brought about by using a
smoothing window of 10ms (in green). The hybrid execution time greatly reduces and is lower
than the packet-level simulation even when only packets are simulated. As expected, the hybrid
simulation performance is still driven by the number of packets traversing the network. Compared
to the packet-level simulation, the experiment shows that the hybrid simulation presents a speed
up of 5.9 when there is 10% of packet-level traffic and 90% of fluid-flow traffic.
152 Chapter 5. Fluid-Flow Network Simulation

250
200 Probe (packet-level)
CWND [Pkts]

150 Average CWNDs (packet-level)


100 Fluid-flow CWND
50
00 10 20 30 40 50 60 70
90
80 Packet-level Probe
Throughput [Mbp]

70
60
50 Packet-level Total
40
30
Fluid-flow Total
20
10
00 10 20 30 40 50 60 70
3.0
2.5 Hybrid Queue Size
Queue Size [Mb]

2.0 Hybrid Queue RED estimate


1.5
1.0
0.5
0.00 10 20 30 40 50 60 70
Simulation Time [s]

Figure 5.37: Hybrid simulation metrics with 20 packet-level and 20 fluid-flow TCP sessions using
a smoothing window of 10ms.
5.6. Hybrid Network Simulation: Integrating Fluid-Flow and Packet-Level models
153

Packet-level traffic
Total Throughput (Mb) - 30s to 70s

5000
Fluid-flow traffic
4000

3000

2000

1000

0 0 4 8 12 16 20 24 28 32 36 40
Number of packet-level TCP sessions (40 total sessions)
(a) Throughput share between packet-level and fluid-flow traffic

400
Hybrid model (no smoothing)
Simulation Execution Time (s)

350
Hybrid model (smoothing=0.01)
300 Packet-level model
250
200
150
100
50
0 0 4 8 12 20 24 28 32 36 40
Number of packet-level TCP sessions (total 40)
(b) Execution times of the packet-level and hybrid model with and without smoothing for 70s of virtual
simulation time and aggregated throughput of ∼ 7Gb .

Figure 5.38: Hybrid simulation results increasing the percentage of packet-level traffic. A smooth-
ing window of 10ms is used.
154 Chapter 5. Fluid-Flow Network Simulation

5.7 Conclusions
We presented a novel simulation approach to unify the experience of designing network models
both with fluid- and packet-level techniques.
Under the DEVS M&S framework both types of models boil down to a common discrete event
simulation, rely on a same mathematical framework, modeling methodology, and practical tool,
thus reducing the learning curve and simplifying model description.
Also a novel hybrid approach was presented to integrate the packet-level and fluid-flow models,
profiting from the DEVS representation and QSS properties.
A prototypical fluid buffer-server system was characterized as a Retarded Functional Delay
Equations (RFDE) with Implicit Delays, and proposals to tackle the challenges of its numerical
solutions were described. A new FDQSS method was developed to support the numerical ap-
proximation of retarded FDEs with implicit delays. New modular blocks are now available in the
PowerDEVS continuos library and can be easily reused in other continuous dynamic systems.
Besides, a new QSS Bounded Integrator was presented for handling efficiently sharp disconti-
nuities, by enforcing maximum and minimum integration boundaries present in fluidified variables
such as queue size and congestion window sizes.
Also, a new set of modeling data structures for DEVS was developed, that allows the simul-
taneous manipulation of multiple QSS signals and attributes as a group of related Fluid Entities.
This greatly simplifies the modeling of complex continuous systems with routing schemes across
network topologies.
Leveraging on the above listed contributions, we presented a new modular approach to modeling
fluid-flow networks which allow users to build network topologies without dealing directly with
the underlying differential equations, nor the numerical methods to solve them. We hope this
facilitates the adoption of fluid-flow models into network engineering projects, which are usually
hard to incorporate as very different mathematical background and tools are required.
Table 5.8 shows some of the available fluid and hybrid models comparing some features with
our approach.
Experimentation with canonical network scenarios corroborated that the presented models and
tools provide flat simulation times when total throughput is increased for fluid-based abstractions,
compared to linearly increasing simulation times in the packet-based models. Fluid-flow models
offer accurate approximations of packet metrics in different experiments.
Experiments corroborated our initial conjectures that hybrid models with QSS do not require
ad-hoc clock synchronization or smoothing techniques, retain the performance advantages of the
fluid-model, and provide detailed packet-level tracking features.
Meanwhile, the strategy of developing libraries of reusable, block-oriented and self-contained
models proved successful: the visual design of network topologies can now be implemented almost
indistinguishably from the underlying representation of the network, be it fluid- or packet-based.
Fluid-Flow Simulation Packet-Level Hybrid Simulation
Model Technique / Network Model Technique / Clock Syn- Packet-Fluid Integration
Solver Specification Simulator chronization
MGT Discrete-time ODEs coupled Discrete-events fixed intervals - Two pass model
[24] Runge Kutta with numerical ns-2, pdns, back- - smoothing packets
solver (C++) plane - fixed updates on fluid metrics
5.7. Conclusions

- Interactions at network bound-


aries
JIZT Discrete-time ODEs coupled Discrete-events - fixed intervals - Fix-up computation for the
[10] Runge–Kutta with numerical PRIME(CPU) - batch updates "under/over flow problem"
(GPU) solver (CUDA + (optional) - integration at each network
C++) queue
FluidSim Discrete-event - ATM networks no no no
[26] piecewise - custom scheduler
constant rate-
based
IP-TN Discrete-event - IP-TN interface discrete-events no - integration at each network
[168] piecewise - open-loop IP-TN simulator queue
constant rate- (ON/OFF)
based sources only

CK15 Discrete- - PowerDEVS Discrete-events no - hybrid packets


[81] events (DEVS) blocks (DEVS) - packets do not affect fluid
QSS Family - single hybrid PowerDEVS
queue, single
hybrid flow
BC19 Discrete PowerDEVS Net- Discrete Events no - Hybridization at each network
(this Events work blocks (DEVS) queue
Thesis) (DEVS) PowerDEVS - Topologies of hybrid queues
QSS Family - Smoothing (optional)
Table 5.8: Comparison of fluid-flow and hybrid models
155
Resumen: Simulación por Aproximaciones
Fluidas

En este capítulo se describen modelos de simulación por aproximaciones fluidas e híbridos bajo
el formalismo DEVS. El nuevo enfoque modular unifica el diseño de topologías de red para las
técnicas paquete a paquete, de aproximaciones fluidas e híbridas. Bajo el marco de M&S DEVS
estos modelos se reducen a una simulación unificada a eventos discretos, se basan en un mismo
marco matemático, una misma metodología de modelado y la misma herramienta práctica, lo que
reduce la curva de aprendizaje y simplifica la descripción los modelos.
El capitulo comienza describiendo el sistema cola-servidor que es la pieza central para los
modelos fluidos. Este sistema se caracteriza como una Ecuación Funcional Diferencial con Retardos
implícitos (RFDE), y se proponen nuevas soluciones para su aproximación numérica bajo. Se
presenta un un nuevo método Forward Delay QSS (FDQSS) para simulación de RFDE y un nuevo
integrador QSS acotado para el manejo eficiente y preciso de discontinuidades ( Sección 5.3).
Luego, se introducen nuevas estructuras de datos para el modelado DEVS que permiten la
manipulación de múltiples atributos y señales QSS como una única entidad de fluida (consulte
la Sección 5.5.1. Esto simplifica enormemente el modelado de sistemas continuos complejos que
posean esquemas de ruteo a través de topologías.
Aprovechando las contribuciones anteriores, se presenta un nuevo enfoque modular para el
modelado de redes por aproximaciones fluidas. El nuevo enfoque permite a los usuarios crear
topologías de red sin tratar directamente con las ecuaciones diferenciales subyacentes, ni con los
métodos numéricos para resolverlas (consulte la Sección 5.5). El objetivo es facilitar la adopción
de modelos fluidos en proyectos de ingeniería de redes, que tradicionalmente suelen ser difíciles de
incorporar ya que se requieren herramientas y antecedentes matemáticos muy diferentes.
Posteriormente, se corrobora que los modelos fluidos proporcionan tiempos de simulación planos
al aumentar la capacidad de los enlaces, en comparación con los tiempos de simulación que au-
mentan linealmente en los modelos paquete a paquetes (consulte la Figura 5.29). Los modelos
fluidos ofrecen aproximaciones precisas al compararlos en diferentes experimentos con métricas de
modelos paquete a paquete (ver Figuras 5.13, 5.27, y 5.28).
Finalmente, se presenta un nuevo enfoque híbrido que integra los modelos paquete a paquete
y los fluidos, aprovechando la representación bajo DEVS y las propiedades de QSS (consulte la
Sección 5.6). Diversos experimentos corroboran que los modelos híbridos con QSS no requieren
técnicas ad-hoc de suavización o para la sincronización del reloj de simulación. Adicionalmente,
conservan las ventajas de rendimiento de los modelos fluidos (ver Figura 5.36b) y proporcionan
trazas detalladas de los modelos de paquetes (ver Figuras 5.34 y 5.35), permitiendo ajustar la
relación de precisión frente al desempeño según los requerimientos (ver Figura 5.36a).

156
Chapter 6

Conclusions and Future Work

In this Thesis we presented new techniques for the development of packet-level, fluid-flow, and
hybrid network simulation models under the unifying DEVS formalism. The introduced simulation
techniques, their theoretical support, and practical tools allow users to seamlessly specify network
topologies at different abstraction levels, in a modular and hierarchical way.
The novel library for hybrid simulation enables the mutual influence of fluid-flow with packet-
level under the same simulation technique and tool, without resorting to ad-hoc time synchroniza-
tions or traffic smoothing techniques.
Chapter 4 showed packet-level models are well suited to study detailed network performance
and communication patterns in applications and protocols. The highly distributed TDAQ system
at CERN served as a challenging motivating case study. In this context, experiments showed that
simulation models can reproduce network behaviour at the protocol, topology, and application
levels. Packet-level proved useful to study detailed communication patterns in the TCP protocol
and TDAQ applications. These were validated against real metrics taken from the TDAQ system
(Section 4.4.3). Additionally, simulations supported the design of future DAQ network upgrades
(Section 4.3.3.3), served for parameter tuning in the HLT application and helped with what-if
scenarios to analyze load balancing schemes aiming at reducing the critical Event filtering latency.
In Chapter 5 fluid-flow models were presented as an alternative to packet-level models, mainly
to tackle scalability challenges in simulation execution times for high-speed network scenarios. Ex-
periments showed that while packet-level simulation times grow linearly with the network size and
throughput, while the performance of fluid-flow models are insensitive to the overall throughput.
This can provide great advantages for the simulation of future Ethernet speeds.
Additionally, the modeling approach for fluid-flow models, presented in Section 5.5, showed
that topologies can be created graphically (or automatically generated) similar to packet-level
topologies, without dealing directly with the underlying differential equations or their numerical
solvers. This reduces the learning curve for the utilization of fluid-flow models, as the mathematics
involved are encapsulated within each fluid/hybrid queue.
Although fluid-flow models do not capture packet-by-packet information, results shown in Sec-
tion 5.5.5 confirmed that Ordinary Differential Equations are able to capture accurately mean
averaged behaviour as well as transient protocol dynamics.
The hybrid modeling approach presented in Section 5.6 was developed building on the previous,
standalone packet- and fluid-based libraries. Results showed that the hybrid models are a promis-
ing solution for the performance evaluation of current and future large-scale high-speed networks.

157
158 Chapter 6. Conclusions and Future Work

Experiments in Section 5.6.6 showed that hybrid models are able to retain the performance ad-
vantages of fluid-flow models (i.e simulation execution time independent of link bandwidth) while
providing detailed simulation traces for packet-level models of selected sources (i.e packet-by-packet
communications can be studied). The same modeling techniques and tools as for packet-level and
fluid-flow models are used to create hybrid topologies (see Section 5.6.4), thus not increasing the
learning curve for network experts to incorporate hybrid models.
A novelty of the new hybrid approach lies in the fact that discrete and continuous portions of
the networks are guided by a common simulation scheduler, namely the DEVS abstract simulator.
The consequence is that both type of simulations interact smoothly, without calling for ad-hoc
time synchronization algorithms (also termed co-simulation) that can only be useful for particular
integrations of specific software packages, as shown in Section 5.6.2.

Supporting the development of packet-level, fluid-flow and hybrid models, contributions in


other areas were presented in this Thesis.
In Chapter 4, packet-level simulations were used to study systematically load balancing strate-
gies guided by recent results for insensitive load balancing schemes. Experiments from Section
4.5.2.3 suggest that there is a class of policies for which a common critical regime can be identified
and interpreted as a generalization of the Halfin-Whitt-Jagerman regime for one-server systems.
This can provide insights for future theoretical studies on the asymptotic limits of load-balancing
schemes. These findings motivated alternative load balancing control logic in the TDAQ model,
implemented in Section 4.5.3. Simulation predictions for the proposed changes were then matched
with network measurements in the real TDAQ system in Section 4.5.3.2.
In Chapter 5 a detailed study for a general fluid buffer-server system led to the characterization
of the mathematical model as a system of Retarded Functional Delay Equations (RFDE) with
Implicit Delays. To solve such class of systems, we developed a new general purpose Forward Delay
QSS (FDQSS) method, added to the family of QSS numerical methods and to the PowerDEVS
library of continuous systems. The necessary proofs were given in Section 5.4.1 showing that under
reasonable assumptions FDQSS provides guarantees of stability and convergence at any desired
accuracy.
Finally, we conclude that the development process and methodology defined at the beginning
of this Thesis (see Chapter 3) fulfilled the expectations. The advancements made at both the-
oretical and practical levels are generic enough to benefit modelers of data networks and users
of PowerDEVS at large. Notwithstanding, all developments were done pacing the evolution of
the ATLAS TDAQ system at CERN, proving effective to deliver answers to different real world
questions, with enough flexibility to decide the level of detail required for each question and the
time-to-delivery required by the project’s schedule.

6.1 Open problems and future work


A key future work is to study the ATLAS TDAQ network at CERN using fluid and hybrid models.
Although successful results for the TDAQ system were obtained using packet-level simulations, it
is still to be determined the trade-off between accuracy and performance in the specific context of
TDAQ with the technology developed in this Thesis.
In particular, suitable ODEs should be found to represent background traffic. Monitoring and
6.1. Open problems and future work 159

configuration traffic could be well represented by the TCP sources developed here, while ROS to
DCM traffic is better modeled by ON/OFF bursty sources similar to the UDP models developed
here.

Regarding the hybrid models, although experiments showed promising results there is a number
of issues that deserve further studies. For instance, the trade-off between accuracy and performance
for different parameters and smoothing techniques. Also, the performance for tandems of queues
should be characterized; in particular, focusing on the fact that changes in departure rates at one
fluid queue could impose an undesired "ripple effect" on other queues downstream [169].

There are also several alternative TCP fluid models in the literature which might provide better
approximations in different scenarios. A promising model to be implemented in the context of QSS
is the one described in [25]. It provides a fluid-flow TCP representation with different states accord-
ing to the connection phase (called hybrid in the sense that ODEs have many discontinuous states).

Another line worth pursuing is to characterize the impact of QSS on network topologies due
to its asynchronous nature. With classic discrete time methods, equations describing uncongested
links (or inactive hosts) take as much computing time to solve as congested links (or active hosts)
-all links and hosts need to be visited by the discrete time scheduler at each new synchronous
step-. Conversely, QSS balances dynamically computing efforts, as each integrator (e.g. a fluid
queue) adapts its activity according to the dynamics required at different phases. If no dynamics
are present, QSS will not perform new computations. This is, QSS handles efficiently alternating
busy/free periods at each queue, without requiring special model reduction analyses as is done
e.g. in [24]. In fact, a QSS subtype of methods known as Linearly Implicit (LIQSS) [74] provides
formal guarantees of exactly zero computations for a state variable that reached equilibrium. This
strongly suggests further studies are worthwhile on using LIQSS for the fluid and hybrid models
presented in this Thesis.
To conclude, the possibility of having continuous and discrete models of networks interacting
under a common representation could potentially be profited from in areas not related to computer
networks. For instance, a closely related example is city traffic flow control [170] where similar
queuing dynamics are dealt with. Moreover, cross-domain models should be easier to represent,
e.g. to combine city traffic with mobile networked devices, which is currently a research hot topic
in the Internet of Things for Smart Cities.
Bibliography

[1] A. Collaboration, “The ATLAS experiment at the CERN large hadron collider,”
Journal of Instrumentation, vol. 3, no. 08, S08003, 2008.
[2] S. Fernandes, Performance evaluation for network services, systems and protocols.
Springer, 2017, isbn: 3319545191.
[3] K. Wehrle, M. Günes, and J. Gross, Modeling and tools for network simulation.
Springer, 2010.
[4] L. Kocarev and G. Vattay, Eds., Complex dynamics in communication networks,
Springer, 2005, isbn: 9783540243052.
[5] F. E. Cellier and E. Kofman, Continuous system simulation. Springer Science &
Business Media, 2006.
[6] S. McKee and A. Collaboration, “Networks in ATLAS,” Journal of Physics: Confer-
ence Series, vol. 898, no. 05, p. 052 006, 2017, issn: 1742-6596. doi: https://doi.
org/10.1088/1742-6596/898/5/052006.
[7] B. P. Zeigler, A. Muzy, and E. Kofman, Theory of modeling and simulation 3rd edi-
tion: Discrete event and iterative system computational foundations. Elsevier, 2018.
[8] E. Kofman and S. Junco, “Quantized-state systems: A devs approach for continu-
ous system simulation,” Transactions of The Society for Modeling and Simulation
International, vol. 18, no. 3, pp. 123–132, 2001.
[9] Y. Gu, Y. Liu, and D. Towsley, “On integrating fluid models with packet simulation,”
in INFOCOM 2004. Twenty-third AnnualJoint Conference of the IEEE Computer
and Communications Societies, IEEE, vol. 4, 2004, pp. 2856–2866.
[10] J. Liu, Y. Liu, Z. Du, and T. Li, “Gpu-assisted hybrid network traffic model,” in Proc.
of the 2nd ACM SIGSIM Conference on Principles of Advanced Discrete Simulation,
ACM, 2014, pp. 63–74.
[11] M. Bonaventura, D. Foguelman, and R. Castro, “Discrete event modeling and simulation-
driven engineering for the atlas data acquisition network,” IEEE Computing in Sci-
ence & Engineering, vol. 18, no. 3, pp. 70–83, 2016.
[12] M. Bonaventura, M. Jonckheere, and R. Castro, “Simulation study of dynamic load
balancing for processor sharing servers with finite capacity under generalized halfin-
whitt-jagerman regimes,” in Proceedings of 2018 Winter Simulation Conference (WSC),
2018.

160
Bibliography 161

[13] M. Bonaventura and R. Castro, “Fluid-flow and packet-level models of data net-
works unified under a modular/hierarchical framework: Speedups and simplicity,
combined,” in Proceedings of 2018 Winter Simulation Conference (WSC), 2018.
[14] D. J. Foguelman, M. A. Bonaventura, and R. D. Castro, “Masada: A modeling and
simulation automated data analysis framework for continuous data-intensive vali-
dation of simulation models,” in 30TH annual European Simulation and Modeling
Conference, (SIANI, University of Las Palmas, Spain, Oct. 26–28, 2016), vol. 30,
2016, pp. 34–42.
[15] A. Laurito, M. Bonaventura, M. Eukeni Pozo Astigarraga, and R. Castro, “Topogen:
A network topology generation architecture with application to automating simula-
tions of software defined networks,” in 2017 Winter Simulation Conference (WSC),
2017, pp. 1049–1060. doi: 10.1109/WSC.2017.8247854.
[16] M. Bonaventura, “Unified packet-level and fluid-flow simulation of large-scale net-
works,” in Ph.D. Colloquium of 2018 Winter Simulation Conference (WSC), ACM
SIGSIM Best Ph.D. Student Paper Award, 2018. [Online]. Available: https://www.
acm-sigsim-mskr.org/bestPhDpaperAwardRecipients.htm.
[17] L. Santi and M. Bonaventura, Py2pdevs: A python to powerdevs interface, https:
//gitlab.cern.ch/tdaq-simulation/powerdevs/, 2018.
[18] S. McCanne, “Network simulator ns-2,” Http://www.isi.edu/nsnam/ns/, 1997.
[19] G. Carneiro, “Ns-3: Network simulator 3,” in UTM Lab Meeting April, vol. 20, 2010.
[20] A. Varga and R. Hornig, “An overview of the omnet++ simulation environment,”
in Proceedings of the 1st international conf. on Simulation tools and techniques for
communications, networks and systems, ICST, 2008, p. 60.
[21] I. Katzela, Modeling and simulating communication networks: A hands-on approach
using opnet. Prentice Hall PTR, 1998.
[22] R. Barr, Z. J. Haas, and R. van Renesse, “Jist: An efficient approach to simulation
using virtual machines,” Software: Practice and Experience, vol. 35, no. 6, pp. 539–
576, 2005.
[23] E. D. Ngangue Ndih and S. Cherkaoui, “Simulation methods, techniques and tools
of computer systems and networks,” in Modeling and Simulation of Computer Net-
works and Systems: Methodologies and Applications, M. S. Obaidat, F. Zarai, and
P. Nicopolitidis, Eds., Morgan Kaufmann, 2015.
[24] Y. Liu, F. Lo Presti, V. Misra, D. Towsley, and Y. Gu, “Fluid models and solutions
for large-scale ip networks,” in ACM SIGMETRICS Performance Evaluation Review,
ACM, vol. 31, 2003, pp. 91–101.
[25] S. Bohacek, J. a. P. Hespanha, J. Lee, and K. Obraczka, “A hybrid systems mod-
eling framework for fast and accurate simulation of data communication networks,”
in Proceedings of the 2003 ACM SIGMETRICS Conference on Measurement and
Modeling of Computer Systems, ACM, vol. 31, 2003, pp. 58–69, isbn: 1-58113-664-1.
doi: 10.1145/781027.781036.
162 Bibliography

[26] J. Incera, R. Marie, D. Ros, and G. Rubino, “Fluidsim: A tool to simulate fluid models
of high-speed networks,” in International Conference on Modelling Techniques and
Tools for Computer Performance Evaluation, Springer, 2000, pp. 230–246.
[27] M. S. Obaidat and N. A. Boudriga, Fundamentals of performance evaluation of com-
puter and telecommunication systems. Wiley-Interscience, 2010, isbn: 0471269832.
[28] S. Keshav, Mathematical foundations of computer networking. Addison-Wesley, 2012.
[29] M. Harchol-Balter, Performance modeling and design of computer systems: Queueing
theory in action, 1st. New York, NY, USA: Cambridge University Press, 2013, isbn:
1107027500, 9781107027503.
[30] R. Beuran, Introduction to network emulation. Pan Stanford, 2012, isbn: 9814310913.
[31] R. Srikant, The mathematics of internet congestion control. Springer Science & Busi-
ness Media, 2012.
[32] R. W. R. Darling and J. R. Norris, “Differential equation approximations for markov
chains,” Probability Surveys, vol. 5, p. 37, 2008. doi: doi:10.1214/07-PS121.
[33] G. Grimmett and D. Stirzaker, Probability and random processes. Oxford university
press, 2001.
[34] D. J. Wilkinson, Stochastic modelling for systems biology. Chapman and Hall/CRC,
2006.
[35] H. Andersson and T. Britton, Stochastic epidemic models and their statistical anal-
ysis. Springer Science & Business Media, 2012, vol. 151.
[36] J.-Y. Le Boudec, Performance evaluation of computer and communication systems.
Epfl Press, 2011.
[37] L. Bortolussi and N. Gast, “Mean-field limits beyond ordinary differential equations,”
in International School on Formal Methods for the Design of Computer, Communi-
cation and Software Systems, Springer, 2016, pp. 61–82.
[38] T. G. Kurtz, “Solutions of ordinary differential equations as limits of pure jump
markov processes,” Journal of applied Probability, vol. 7, no. 1, pp. 49–58, 1970.
[39] M. Benaim and J.-Y. Le Boudec, “A class of mean field interaction models for
computer and communication systems,” Performance evaluation, vol. 65, no. 11-12,
pp. 823–838, 2008.
[40] G. Bolch, S. Greiner, H. De Meer, and K. S. Trivedi, Queueing networks and markov
chains: Modeling and performance evaluation with computer science applications.
John Wiley & Sons, 2006.
[41] E. Alliance, Ethernet roadmap, www . ethernetalliance . org / roadmap, Accessed:
2018-04-04, 2015.
[42] L. Kleinrock, “Models for computer networks,” in Proc. of the IEEE International
Conference on Communications, Boulder, Colorado, 1969, pp. 21/9–21/16.
[43] J. Padhye, V. Firoiu, D. Towsley, and J. Kurose, “Modeling tcp throughput: A simple
model and its empirical validation,” ACM SIGCOMM Computer Communication
Review, vol. 28, no. 4, pp. 303–314, 1998.
Bibliography 163

[44] V. Misra, W.-B. Gong, and D. Towsley, “Fluid-based analysis of a network of aqm
routers supporting tcp flows with an application to red,” in ACM SIGCOMM Com-
puter Communication Review, ACM, vol. 30, 2000, pp. 151–160.
[45] X. Liu, E. Chong, and N. Shroff, “A framework for opportunistic scheduling in wire-
less networks,” Comp. Netw., vol. 41, pp. 451–474, 2003.
[46] S. Kunniyur and R. Srikant, “End-to-end congestion control schemes: Utility func-
tions, random losses and ecn marks,” IEEE/ACM Transactions on Networking (TON),
vol. 11, no. 5, pp. 689–702, 2003.
[47] M. A. Marsan, M. Garetto, P. Giaccone, E. Leonardi, E. Schiattarella, and A. Tarello,
“Using partial differential equations to model tcp mice and elephants in large ip
networks,” IEEE/ACM Transactions on Networking, vol. 13, no. 6, pp. 1289–1301,
2005.
[48] J. C. Butcher, “The numerical analysis of ordinary differential equations: Runge-
kutta and general linear methods,” 1987.
[49] J. D. Lambert, Numerical methods for ordinary differential systems: The initial value
problem. John Wiley & Sons, Inc., 1991.
[50] E. Hairer, S. P. Nørsett, and G. Wanner, Solving ordinary differential equations i:
Nonstiff problems (springer series in computational mathematics) (v. 1). Springer,
2011, isbn: 3540566708.
[51] L. F. Shampine and M. W. Reichelt, “The matlab ode suite,” SIAM journal on
scientific computing, vol. 18, no. 1, pp. 1–22, 1997.
[52] P. Fritzson and V. Engelson, “Modelica—a unified object-oriented language for sys-
tem modeling and simulation,” in European Conference on Object-Oriented Program-
ming, Springer, 1998, pp. 67–90.
[53] B. Gough, Gnu scientific library reference manual. Network Theory Ltd., 2009.
[54] T. E. Oliphant, “Python for scientific computing,” Computing in Science & Engi-
neering, vol. 9, no. 3, 2007.
[55] W. Perruquetti and J.-P. Barbot, “Tools for ordinary differential equations analysis,”
in Chaos in Automatic Control, CRC Press, 2005, pp. 65–119.
[56] D. R. Willé and C. T. Baker, “Delsol—a numerical code for the solution of systems of
delay-differential equations,” Applied numerical mathematics, vol. 9, no. 3-5, pp. 223–
234, 1992.
[57] L. F. Shampine and S. Thompson, “Solving ddes in matlab,” Applied Numerical
Mathematics, vol. 37, no. 4, pp. 441–458, 2001.
[58] S. Corwin, S Thompson, and S. White, “Solving odes and ddes with impulses,” JNA-
IAM J. Numer. Anal. Indust. Appl. Math, vol. 3, pp. 139–149, 2008.
[59] L. Euler, “De integratione æquationum differentialium per approximationem,” in
Opera Omnia, Institutiones Calculi Integralis, Teubner Verlag, Leipzig, Germany,
1913, 424–434.
164 Bibliography

[60] D. G. Bettis, “Efficient embedded runge—kutta methods,” in Numerical Treatment


of Differential Equations, Springer, 1978, pp. 9–18.
[61] K. E. Brenan, S. L. Campbell, and L. R. Petzold, Numerical solution of initial-value
problems in differential-algebraic equations. Siam, 1996, vol. 14.
[62] B. P. Zeigler, H. Praehofer, and T. G. Kim, Theory of modelling and simulation.
John Wiley & Sons, New York, 1976, vol. 7.
[63] ——, Theory of modeling and simulation: Integrating discrete event and continuous
complex dynamic systems. Academic press, 2000.
[64] G. A. Wainer and P. J. Mosterman, Discrete-event modeling and simulation: Theory
and applications. CRC Press, 2010.
[65] J.-B. Filippi, M Delhom, and F Bernardi, “The jdevs modelling and simulation en-
vironment,” 2002.
[66] F. Bergero and E. Kofman, “Powerdevs: A tool for hybrid system modeling and
real-time simulation,” Simulation, vol. 87, no. 1-2, pp. 113–132, 2011.
[67] A. Zengin and H. Sarjoughian, “Devs-suite simulator: A tool teaching network pro-
tocols,” in Simulation Conference (WSC), Proceedings of the 2010 Winter, IEEE,
2010, pp. 2947–2957.
[68] C. Seo, B. P. Zeigler, R. Coop, and D. Kim, “Devs modeling and simulation method-
ology with ms4 me software tool,” in Proceedings of the Symposium on Theory of
Modeling & Simulation-DEVS Integrative M&S Symposium, Society for Computer
Simulation International, 2013, p. 33.
[69] G. Wainer and N. Giambiasi, “Timed cell-devs: Modeling and simulation of cell
spaces,” in Discrete event modeling and simulation technologies, Springer, 2001, pp. 187–
214.
[70] J. Himmelspach, R. Ewald, S. Leye, and A. M. Uhrmacher, “Parallel and distributed
simulation of parallel devs models,” in Proceedings of the 2007 spring simulation
multiconference-Volume 2, Society for Computer Simulation International, 2007, pp. 249–
256.
[71] F. Bergero and E. Kofman, “A vectorial devs extension for large scale system mod-
eling and parallel simulation,” Simulation, vol. 90, no. 5, pp. 522–546, 2014.
[72] E. Kofman, “A third order discrete event method for continuous system simulation,”
Latin American applied research, vol. 36, no. 2, pp. 101–108, 2006.
[73] G. Migoni, E. Kofman, and F. Cellier, “Quantization-based new integration methods
for stiff ordinary differential equations,” Simulation, vol. 88, no. 4, pp. 387–407, 2012.
[74] G. Migoni, M. Bortolotto, E. Kofman, and F. E. Cellier, “Linearly implicit quantization-
based integration methods for stiff ordinary differential equations,” Simulation Mod-
elling Practice and Theory, vol. 35, pp. 118–136, 2013.
[75] H. K. Khalil and J. Grizzle, Nonlinear systems. Prentice hall Upper Saddle River,
NJ, 2002, vol. 3.
Bibliography 165

[76] E. Kofman, “A second-order approximation for devs simulation of continuous sys-


tems,” Simulation, vol. 78, no. 2, pp. 76–89, 2002.
[77] ——, “Relative error control in quantization based integration,” Latin American ap-
plied research, vol. 39, no. 3, pp. 231–237, 2009.
[78] X. Floros, F. Bergero, F. E. Cellier, and E. Kofman, “Automated simulation of mod-
elica models with qss methods: The discontinuous case,” in Proceedings of the 8th
International Modelica Conference; March 20th-22nd; Technical Univeristy; Dresden;
Germany, Linköping University Electronic Press, 2011, pp. 657–667.
[79] E. Kofman, J. S. Lee, and B. P. Zeigler, “Devs representation of differential equation
systems. review of recent advances,” Proceedings of ESS’01, pp. 591–595, 2001.
[80] R. Castro, E. Kofman, and F. E. Cellier, “Quantization-based integration methods
for delay-differential equations,” Simulation Modelling Practice and Theory, vol. 19,
no. 1, pp. 314–336, 2011.
[81] R Castro and E Kofman, “An integrative approach for hybrid modeling, simulation
and control of data networks based on the devs formalism,” in Modeling and Simu-
lation of Computer Networks and Systems: Methodologies and Applications, Morgan
Kaufmann, 2015, ch. 18.
[82] L. Evans and P. Bryant, “LHC machine,” Journal of instrumentation, vol. 3, no. 08,
S08001, 2008.
[83] S. Chatrchyan, E. de Wolf, et al., “The CMS experiment at the CERN LHC,” Journal
of instrumentation, vol. 3, S08004–1, 2008.
[84] K. Aamodt, A. A. Quintana, R Achenbach, S Acounis, D Adamová, C Adler, M
Aggarwal, F Agnese, G. A. Rinella, Z Ahammed, et al., “The alice experiment at the
CERN LHC,” Journal of Instrumentation, vol. 3, no. 08, S08002, 2008.
[85] A. A. Alves Jr, L. Andrade Filho, A. Barbosa, I Bediaga, G Cernicchiaro, G Guerrer,
H. Lima Jr, A. Machado, J Magnin, F Marujo, et al., “The LHCb detector at the
LHC,” Journal of instrumentation, vol. 3, no. 08, S08005, 2008.
[86] M. Benedikt and F. Zimmermann, “Towards future circular colliders,” Journal of the
Korean Physical Society, vol. 69, no. 6, pp. 893–902, 2016.
[87] S. Mckee, “Networking: The view from hep,” Journal of Physics: Conference Series,
vol. 898, p. 022 001, 2017. doi: 10.1088/1742-6596/898/2/022001.
[88] CERN, The atlas collaboration, https://atlas.cern/discover/collaboration,
Accessed: 31-01-2019, 2019.
[89] M. Pozo Astigarraga, E. ATLAS Collaboration, et al., “Evolution of the ATLAS trig-
ger and data acquisition system,” in Journal of Physics: Conf. Series, IOP, vol. 608,
2015, p. 012 006.
[90] O. Balci, “Guidelines for successful simluation studies (tutorial session),” in Proceed-
ings of the 22nd conference on Winter simulation, IEEE Press, 1990, pp. 25–32.
[91] P. Kruchten, The rational unified process: An introduction. Addison-Wesley Profes-
sional, 2004.
166 Bibliography

[92] S. Robinson, R. Brooks, K. Kotiadis, and D.-J. Van Der Zee, Conceptual modeling
for discrete-event simulation. CRC Press, 2010.
[93] G. Lehmann, “Data acquisition and event building studies for the atlas experiment,”
PhD thesis, Bern University, 2000.
[94] L. Leahu, “Analysis and predictive modeling of the performance of the atlas tdaq
network,” PhD thesis, Bucharest, Tech. U., 2013.
[95] T. Colombo, H. Fröning, P. J. García, and W. Vandelli, “Modeling a large data-
acquisition network in a simulation framework,” in 2015 IEEE International Confer-
ence on Cluster Computing, 2015, pp. 809–816. doi: 10.1109/CLUSTER.2015.137.
[96] T. Colombo, H. Fröning, P. J. Garcìa, and W. Vandelli, “Optimizing the data-
collection time of a large-scale data-acquisition system through a simulation frame-
work,” The Journal of Supercomputing, vol. 72, no. 12, pp. 4546–4572, 2016.
[97] A. Santos, P. J. García, W. Vandelli, and H. Fröning, “Modeling resource utilization
of a large data acquisition system,” in International conference on Technology and
Instrumentation in Particle Physics, Springer, 2017, pp. 346–349.
[98] A. Santos, W. Vandelli, H. Froening, and P. J. Garcia Garcia, “Buffer provisioning for
large-scale data-acquisition systems,” in Proceedings of The 12th ACM International
Conference on Distributed and Event-based Systems (DEBS ’18), ACM, New York,
NY, USA, 2018.
[RFC793] J. Postel, “Transmission Control Protocol,” RFC Editor, RFC 793, 1981. [Online].
Available: https://tools.ietf.org/html/rfc793.
[99] H. Zimmermann, “Osi reference model–the iso model of architecture for open systems
interconnection,” IEEE Transactions on communications, vol. 28, no. 4, pp. 425–432,
1980.
[100] S. Tarbouriech, C. T. Abdallah, and J. Chiasson, Advances in communication control
networks. Springer Science & Business Media, 2004, vol. 308.
[101] V. Jacobson, “Congestion avoidance and control,” in ACM SIGCOMM computer
communication review, ACM, vol. 18, 1988, pp. 314–329.
[RFC 7414] M. Duke et al., “A Roadmap for Transmission Control Protocol,” RFC Editor, 7414
7414, 2015. [Online]. Available: https://tools.ietf.org/html/rfc7414.
[102] D. Chkliaev, J. Hooman, and E. De Vink, “Verification and improvement of the
sliding window protocol,” in International Conference on Tools and Algorithms for
the Construction and Analysis of Systems, Springer, 2003, pp. 113–127.
[103] J. Padhye, V. Firoiu, D. F. Towsley, and J. F. Kurose, “Modeling tcp reno perfor-
mance: A simple model and its empirical validation,” IEEE/ACM Transactions on
Networking (ToN), vol. 8, no. 2, pp. 133–145, 2000.
[104] Y. R. Yang and S. S. Lam, “General aimd congestion control,” in Icnp, IEEE, 2000,
p. 187.
[105] S. Jero, E. Hoque, D. Choffnes, A. Mislove, and C. Nita-Rotaru, “Automated attack
discovery in tcp congestion control using a model-guided approach,” 2018. doi: 10.
14722/ndss.2018.23119.
Bibliography 167

[106] U. R. Pujeri, V Palaniswamy, P Ramanathan, and R. Pujeri, “Comparative analysis


and comparison of various aqm algorithm for high speed,” Indian Journal of Science
and Technology, vol. 8, no. 35, 2015.
[107] S. Floyd and V. Jacobson, “Random early detection gateways for congestion avoid-
ance,” IEEE/ACM Transactions on networking, vol. 1, no. 4, pp. 397–413, 1993.
[108] B. Braden, D. D. Clark, J. Crowcroft, B. Davie, S. Deering, D. Estrin, S. Floyd, V.
Jacobson, G. Minshall, C. Partridge, L. Peterson, K. Ramakrishnan, S. Shenker, J.
Wroclawski, and L. Zhang, “Recommendations on queue management and congestion
avoidance in the internet,” RFC Editor, RFC 2309, 1998. [Online]. Available: http:
//www.rfc-editor.org/rfc/rfc2309.txt.
[RFC 3168] K. Ramakrishnan et al., “The Addition of Explicit Congestion Notification (ECN) to
IP,” RFC Editor, 3168 3168, 2001. [Online]. Available: https://tools.ietf.org/
html/rfc3168.
[109] M. Team, “Mininet: An instant virtual network on your laptop (or other pc)-mininet,”
Mininet,[Online]. Available: Http://mininet. org.[Accessed 10 January 2017], 2017.
[110] J. Lessmann, P. Janacik, L. Lachev, and D. Orfanus, “Comparative study of wireless
network simulators,” in Networking, 2008. ICN 2008. Seventh International Confer-
ence on, IEEE, 2008, pp. 517–523.
[111] L Begg, W Liu, K Pawlikowski, S Perera, and H Sirisena, “Survey of simulators of
next generation networks for studying service availability and resilience,” 2006.
[112] D. Pal, “A comparative analysis of modern day network simulators,” in Advances in
Computer Science, Engineering & Applications, Springer, 2012, pp. 489–498.
[113] D Curren, A survey of simulation in sensor networks. student project, Student project,
2007. [Online]. Available: www . cs . binghamton . edu / Ëœkang / teaching / cs580s /
david.pdf.
[114] B. Schilling, “Qualitative comparison of network simulation tools,” Technical report,
Institute of Parallel and Distributed Systems (IPVS), University of Stuttgart, Tech.
Rep., 2005.
[115] P. Nov, “Simulation of network structures,” Master thesis, Charles University in
Prague, Tech. Rep., 2006.
[116] D Nicol, “Comparison of network simulators revisited,” Dartmouth College, pp. 1–8,
2002.
[117] S Duflos, G. Grand, A. Diallo, C Chaudet, A Hecker, C Balducelli, F Flentge, C
Schwaegerl, and O Seifert, “Deliverable d 1.3. 2: List of available and suitable simu-
lation components,” Technical Report Ecole Nationale Supieure des Tommunications
(ENST), 2006.
[118] M. Karl, “A comparison of the architecture of network simulators ns-2 and tossim,” in
Proceedings of Performance Simulation of Algorithms and Protocols Seminar. Institut
fr Parallele und Verteilte Systeme, Universit Stuttgart, 2005.
[119] L. Hogie, P. Bouvry, and F. Guinand, “An overview of manets simulation,” Electronic
notes in theoretical computer science, vol. 150, no. 1, pp. 81–101, 2006.
168 Bibliography

[120] E. Egea López et al., “Simulation scalability issues in wireless sensor networks.,”
2006.
[121] V Efthimia, Free tools for network simulation, 2006.
[122] P. P. Garrido, M. P. Malumbres, and C. T. Calafate, “Ns-2 vs. opnet: A comparative
study of the ieee 802.11 e technology on manet environments,” in Proceedings of the
1st international conference on Simulation tools and techniques for communications,
networks and systems & workshops, ICST (Institute for Computer Sciences, Social-
Informatics and . . ., 2008, p. 37.
[123] D. Cavin, Y. Sasson, and A. Schiper, “On the accuracy of manet simulators,” in
Proceedings of the second ACM international workshop on Principles of mobile com-
puting, ACM, 2002, pp. 38–43.
[124] E. Weingartner, H. Vom Lehn, and K. Wehrle, “A performance comparison of re-
cent network simulators,” in Communications, 2009. ICC’09. IEEE International
Conference on, IEEE, 2009, pp. 1–5.
[125] D. M. Nicol, “Scalability of network simulators revisited,” in Proceedings of the Com-
munication Networks and Distributed Systems Modeling and Simulation Conference,
2003, pp. 1–8.
[126] D. Albeseder, M. Függer, F. Breitenecker, T. Löscher, and S. Tauböck, Small pc-
network simulation-a comprehensive performance case study. 2005.
[127] X. Zhou and H. Tian, “Comparison on network simulation techniques,” in Paral-
lel and Distributed Computing, Applications and Technologies (PDCAT), 2016 17th
International Conference on, IEEE, 2016, pp. 313–316.
[128] S. Siraj, A Gupta, and R. Badgujar, “Network simulation tools survey,” International
Journal of Advanced Research in Computer and Communication Engineering, vol. 1,
no. 4, pp. 199–206, 2012.
[129] J. H. Dshalalow, Frontiers in queueing: Models and applications in science and en-
gineering. CRC press, 1997, vol. 7.
[130] M. Harchol-Balter, Performance modeling and design of computer systems: Queueing
theory in action. Cambridge University Press, 2013.
[131] M. Mitzenmacher, The power of two choices in randomized load balancing, ser. Ph.D.
Thesis.
[132] N. D. Vvedenskaya, R. L. Dobrushin, and F. I. Karpelevich, “Queueing system with
selection of the shortest of two queues: An assymptotic approach,” Problems of In-
formation Transmission, vol. 32, no. 1, pp. 15–27, 1996.
[133] C. Graham, “Chaoticity on path space for a queueing network with selection of the
shortest queue among several,” J. Appl. Probab., vol. 37, no. 1, pp. 198–211, Mar.
2000. doi: 10.1239/jap/1014842277.
[134] A. Mukhopadhyay, A. Karthik, R. R. Mazumdar, and F. Guillemin, “Mean field
and propagation of chaos in multi-class heterogeneous loss models,” Performance
Evaluation, vol. 91, pp. 117 –131, 2015, Special Issue: Performance 2015, issn: 0166-
5316. doi: 10.1016/j.peva.2015.06.008.
Bibliography 169

[135] P. B. Bramson M. Lu Y., “Asymptotic independence of queues under randomized


load balancing,” Queueing Syst, vol. 71, pp. 247–292, 2012.
[136] T. Bonald, M. Jonckheere, and A. Proutière, “Insensitive load balancing,” SIGMET-
RICS Perform. Eval. Rev., vol. 32, no. 1, pp. 367–377, Jun. 2004, issn: 0163-5999.
doi: 10.1145/1012888.1005729. [Online]. Available: http://doi.acm.org/10.
1145/1012888.1005729.
[137] J. Leino and J. Virtamo, “Insensitive load balancing in data networks,” Comput.
Netw., vol. 50, no. 8, pp. 1059–1068, 2006, issn: 1389-1286. doi: 10.1016/j.comnet.
2005.09.009.
[138] V. Pla, J. Virtamo, and J. Martinez-Bauset, “Optimal robust policies for bandwidth
allocation and admission control in wireless networks,” Computer Networks, vol. 52,
no. 17, pp. 3258 –3272, 2008, issn: 1389-1286.
[139] M. Jonckheere and J. Mairesse, “Towards an erlang formula for multiclass networks,”
Queueing Systems, vol. 66, no. 1, pp. 53–78, 2010, issn: 0257-0130. doi: 10.1007/
s11134-010-9185-y.
[140] T. Bonald and A. Proutière, “Insensitive bandwidth sharing in data networks,”
Queueing Syst. Theory Appl., vol. 44, no. 1, pp. 69–100, 2003, issn: 0257-0130.
[141] M. Jonckheere, “Insensitive versus efficient dynamic load balancing in networks with-
out blocking,” Queueing Syst., vol. 54, no. 3, pp. 193–202, 2006.
[142] M. Jonckheere and B. J. Prabhu, “Asymptotics of insensitive load balancing and
blocking phases,” SIGMETRICS Perform. Eval. Rev., vol. 44, no. 1, pp. 311–322,
Jun. 2016, issn: 0163-5999. doi: 10.1145/2964791.2901454.
[143] S. Halfin and W. Whitt, “Heavy-traffic limits for queues with many exponential
servers,” Operations Research, vol. 29, no. 3, pp. 567–588, 1981.
[144] D. L. Jagerman, “Some properties of the Erlang loss function,” Bell System Technical
Journal, vol. 53, no. 3, pp. 525–551, 1974.
[145] S. Ryu, “FELIX: The new detector readout System for the ATLAS Experiment,”
CERN, Geneva, Tech. Rep. ATL-DAQ-PROC-2017-008, 2017. [Online]. Available:
https://cds.cern.ch/record/2253330.
[146] J Anderson, A Borga, H Boterenbrood, H Chen, K Chen, G Drake, D Francis, B
Gorini, F Lanni, G. L. Miotto, et al., “Felix: A high-throughput network approach
for interfacing to front end electronics for atlas upgrades,” in Journal of Physics:
Conf. Series, IOP, vol. 664, 2015, p. 082 050.
[147] E. M. Ledesma, “Simulación integrada de controladores continuo y discreto: Efec-
tos del control de congestión de TCP y nodos enrutadores sobre un controlador de
motores operando en red,” Universidad Nacional de Rosario, Facultad de Ciencias
Exactas, Ingeniería y Agrimensura Escuela de Ingeniería Electrónica, Tech. Rep.,
2015.
[148] A. Negri, “Evolution of the trigger and data acquisition system for the ATLAS ex-
periment,” in Journal of Physics: Conference Series, IOP Publishing, vol. 396, 2012,
p. 012 033.
170 Bibliography

[149] N Garelli, “The evolution of the trigger and data acquisition system in the ATLAS
experiment,” vol. 513, 2014. doi: 10.1088/1742-6596/513/1/012007.
[150] J Almeida, M Dobson, A Kazarov, G. L. Miotto, J. Sloper, I Soloviev, and R Tor-
res, “The ATLAS DAQ system online configurations database service challenge,” in
Journal of Physics: Conference Series, IOP Publishing, vol. 119, 2008, p. 022 004.
[151] A. D. Sicoe, G. L. Miotto, L. Magnoni, S. Kolos, and I. Soloviev, “A persistent back-
end for the ATLAS TDAQ online information service (P-BEAST),” in Journal of
Physics: Conference Series, IOP Publishing, vol. 368, 2012, p. 012 002.
[152] D. J. Foguelman, M. A. Bonaventura, and R. D. Castro, “Masada: A modeling and
simulation automated data analysis framework for continuous data-intensive valida-
tion of simulation models,” in 30th European Simulation and Modelling Conference,
2016.
[153] S. Kulkarni, P. Agrawal, et al., Analysis of tcp performance in data center networks.
Springer, 2014.
[154] T. Colombo, A. Collaboration, et al., “Data-flow performance optimisation on unre-
liable networks: The atlas data-acquisition case,” in Journal of Physics: Conference
Series, IOP Publishing, vol. 608, 2015, p. 012 005.
[155] R. Righter and J. G. Shanthikumar, “Scheduling multiclass single server queueing
systems to stochastically maximize the number of successful departures,” PEIS, vol.
3, pp. 323–333, 1989.
[156] G. Christen, A. Dobniewski, and G. Wainer, “Modeling state-based devs models in
cd++,” in Proceedings of MGA, advanced simulation technologies conference, 2004,
pp. 105–110.
[157] B. Chen and H. Vangheluwe, “Symbolic flattening of devs models,” in Proceedings of
the 2010 Summer Computer Simulation Conference, Society for Computer Simulation
International, 2010, pp. 209–218.
[158] G. F. Riley, R. M. Fujimoto, and M. H. Ammar, “A generic framework for paral-
lelization of network simulations,” in Proc. of the 7th International Symposium on
Modeling, Analysis and Simulation of Computer and Telecommunication Systems,
IEEE, 1999, pp. 128–135.
[159] J. S. Ahn and P. B. Danzig, “Packet network simulation: Speedup and accuracy
versus timing granularity,” IEEE/ACM Transactions on Networking (TON), vol. 4,
no. 5, pp. 743–757, 1996.
[160] D. Anick, D. Mitra, and M. M. Sondhi, “Stochastic theory of a data-handling system
with multiple sources,” Bell Labs Technical Journal, vol. 61, no. 8, pp. 1871–1894,
1982.
[161] S. Deb, S. Shakkottai, and R Srikant, “Stability and convergence of tcp-like conges-
tion controllers in a many-flows regime,” in INFOCOM 2003. Twenty-Second An-
nual Joint Conference of the IEEE Computer and Communications. IEEE Societies,
IEEE, vol. 2, 2003, pp. 884–894.
Bibliography 171

[162] Y. Yi and S. Shakkottai, “Flunet: A hybrid internet simulator for fast queue regimes,”
Computer Networks, vol. 51, no. 18, pp. 4919–4937, 2007.
[163] R. Castro, “Integrative tools for modeling, simulation and control of data networks,”
in Spanish, extended summary in English, PhD thesis, National University of Rosario,
Argentina, 2010.
[164] A. Bellen and M. Zennaro, Numerical methods for delay differential equations. Oxford
university press, 2013.
[165] E. Kofman, “Discrete event simulation of hybrid systems,” SIAM Journal on Scien-
tific Computing, vol. 25, no. 5, pp. 1771–1797, 2004.
[166] J. K. Hale and S. M. V. Lunel, Introduction to functional differential equations.
Springer Science & Business Media, 2013, vol. 99.
[167] G. Mao and L. R. Petzold, “Efficient integration over discontinuities for differential-
algebraic systems,” Computers & Mathematics with Applications, vol. 43, no. 1-2,
pp. 65–79, 2002.
[168] C. Kiddle, R. Simmonds, C. Williamson, and B. Unger, “Hybrid packet/fluid flow
network simulation,” in Proceedings of the seventeenth workshop on Parallel and
distributed simulation, IEEE Computer Society, 2003, p. 143.
[169] B. Liu, D. R. Figueiredo, Y. Guo, J. Kurose, and D. Towsley, “A study of networks
simulation efficiency: Fluid simulation vs. packet-level simulation,” in Proc. of the
20th Annual Joint Conference of the IEEE Computer and Communications Societies,
IEEE, vol. 3, 2001, pp. 1244–1253.
[170] K. Aboudolas and N. Geroliminis, “Perimeter and boundary flow control in multi-
reservoir heterogeneous networks,” Transportation Research Part B: Methodological,
vol. 55, pp. 265–281, 2013.
[171] M. Folk, G. Heber, Q. Koziol, E. Pourmal, and D. Robinson, “An overview of the
hdf5 technology suite and its applications,” in Proceedings of the EDBT/ICDT 2011
Workshop on Array Databases, ACM, 2011, pp. 36–47.
Chapter 7

Appendix

7.1 Proof of Theorem 2


Consider an explicit ODE defined by the following expression:

dx(t) a(t − x) − C
= f (x, t) = (7.1)
dt a(t − x)
In order for Eq.(7.1) to be solvable by QSS methods we require that f (x, t) be globally Lipschitz
both in t and x.
Assume that a(t) is globally Lipschitz with a(t) ≥  > 0 ∀t and |a(t2 ) − a(t1 )| < La |t2 − t1 |,
considering C ∈ R+ ,  ∈ R+ and  arbitrarily small.
Let La and Lf be the Lipschitz constants of a(·) and f (·) respectively.
Then,

1 1
|f (x1 , t) − f (x2 , t)| = C
− (7.2)
a(t − x2 ) a(t − x1 )

a(t − x1 ) − a(t − x2 )
= C (7.3)
a(t − x1 )a(t − x2 )
|x1 − x2 |
≤ C.La (7.4)
2
≤ Lf |x1 − x2 | (7.5)

with Lf = C.L2
a

Therefore, according to Theorems 1 to 6 in [8], the QSS approximation of system (7.1) is a well
posed simulation method since the stability properties are conserved from the original system and
the error can be reduced to arbitrary small values 

7.2 New Enhancements for the PowerDEVS Simulation toolkit


7.2.1 Py2PDEVS: a Python ↔ PowerDEVS interface
A Python-to-PowerDEVS interface, named Py2PDEVS [17], was developed to allow PowerDEVS
classes defined in C++ to be directly accessed by Python code. This aims at simplifying the

172
7.2. New Enhancements for the PowerDEVS Simulation toolkit 173

specification of big topological models using Python code as an alternative the the PowerDEVS
GUI.
Figure 7.1 shows how Py2PDEVS integrates with PowerDEVS compilation. A Python ab-
straction layer generates C++ classes based on the Boost.Python library which allows exporting
them to Python. The PowerDEVS compilation process was updated to automatically generate a
Boost.Python class for every DEVS atomic model in PowerDEVS so that they can be available
also from Python code.
Execution of simulation models defined in Py2PDEVS does not impose performance penalties.
This is because Python is only used to define the model structure, but the simulation cycle and
atomic models code are all executed from the C++ as in the usual workflow.

Figure 7.1: Py2PDEVS architecture

The development of Py2PDEVS allows for a cleaner and more powerful development pipeline.
Model definition, parameter initialization, parameter sweeping and finalization tasks can be per-
formed via Python scripts.
Having atomic models accessible by Python code allows for the definition of Python classes
that mimic DEVS coupled models. Figure 7.2 shows an example for the definition of a simple
network topology using Python code.

7.2.2 Configuration of Simulation Parameters


PowerDEVS was extended with a set of new infrastructure classes, which allow users to specify
simulation parameters in different ways.
Simulation parameters are read by atomic models and must be specified by the user before
starting a simulation.
In the PowerDEVS GUI, blocks define a list of parameters for each model. There, the user must
specify a value for each parameter. This value can be interpreted directly (e.g. a real number) or
can refer to the name of a parameter specified in one of the following alternatives:
174 Chapter 7. Appendix

Figure 7.2: Example Py2PDEVS code to represent a network topology

• Scilab workspace: variables can be read directly from the Scilab workspace. Therefore,
the user can define in the Scilab workspace a variable with any value which will be read by
the PowerDEVS atomic model. Strings are not allowed as variable values.

• Scilab file (.sce) (new): The file is loaded into the Scilab workspace at the start of the
simulation and then works just as the previous item. This allows for more flexibility by
having all parameters defined in a single file. Any Scilab expression or function can be used,
for example a parameter can be specified depending on the values of other parameters.

• Command line (new): parameters can be specified in the command line using the format:
− < parameterN ame >< parameterV alue >

• .ini file (new): a file to be loades can be specified in command line. This allows for more flexi-
bility than command line and having all parameters in a single file is more convenient. The file
must be in .ini format, i.e. each line must be a < parameterN ame >=< parameterV alue.
In this case expressions are not allowed and thus parameters can not depend on each other.
Strings are allowed in this case.
It is possible to combine the .ini file with command line parameters, where command line
parameters take precedence.

A command line parameter is used to choose where atomic models will read parameters from.
For example, the model can be executed as follow to read parameters from an .ini file and overwrite
the value of the ’ExperimentNumber’ variable:
1 . / model − t f 70 −−parameter_reading_backend Cmdline −c . . / myModel .
params −ExperimentNumber 12
Atomic models read their parameters in the init function. To use the parameter configuration
infrastructure, the readDefaultParameterValue function must be used as follow:
7.2. New Enhancements for the PowerDEVS Simulation toolkit 175

1 // read s t r i n g from PowerDEVS GUI


2 char∗ f v a r= va_arg ( parameters , char ∗) ;
3
4 // read parameter from c o n f i g
5 this−>myParam = r ea d D ef a ul t P ar a m et e r Va l u e <double>( f v a r ) ;
Additionally a new parameter to the simulator was added to specify a Python script to be
executed at the end of the simulation. This is useful for post-processing and plotting purposes.
1 . / model − t f 70 −f i n a l i z a t i o n _ s c r i p t myModel . p l o t . py

7.2.2.1 Stochastic Distribution parameters


Atomic models can read parameters interpreting them as stochastic distribution parameters using
the new function readDistributionParameter(string paramName). As shown in the listing below,
this method returns an instance of IDistributionParameter which provides a nextValue method to
retrieve the next random value from the distribution. Probabilistic distributions are implemented
using the C++ GSL library.
1 // g e t parameter name ( e . g . " s e n d e r 1 . p a c k e t S i z e " )
2 char∗ paramName = va_arg ( parameters , char ∗) ;
3
4 // read from c o n f i g as d i s t r i b u t i o n
5 s t d : : shared_ptr<I D i s t r i b u t i o n P a r a m e t e r > p e r i o d =
r e a d D i s t r i b u t i o n P a r a m e t e r ( paramName ) ;
6
7 ...
8
9 // r e t r i e v e n e x t random v a l u e
10 this−>sigma = p e r i o d −>nextValue ( ) ;
Distribution parameters are specified in the configuration. Not all parameter backends sup-
port string (e.g. Scilab) parameters, thus each distribution is associated with an integer value.
The distributions currently available are: Constant=0, Exponential=1, Pareto=2, Bernoulli=3,
Normal=4, Uniform=5.
Each distribution takes different parameters, which are also read from the configuration as
shown below. Additionally, there is a global simulation parameter ReproducibleSimu which permits
setting the seed for the random generator:

• ReproducibleSimu=0 : a new seed is automatically generated in every simulation based on


the CPU time. The current implementation uses a microsecond clock accuracy to allow for
several simulations to be executed at almost the same instant using different seeds. Generated
seeds are stored in the logs and in the results variables.

• ReproducibleSimu=1 : the default seed is used, which will generate always the same sequence
of random numbers. This is useful in many situations in which it is necessary to reproduce
exactly the same simulation, for example for tracking bugs.
176 Chapter 7. Appendix

• ReproducibleSimu=value: with value different from 0 and 1, the value is used as the seed for
the simulation. This is useful to reproduce a specific simulation using a same known seed.

1 # Constant d i s t r i b u t i o n
2 sender1 . packetSize = 0
3 s e n d e r 1 . p a c k e t S i z e _ v a l u e = 3567
4
5 # Exponential d i s t r i b u t i o n
6 sender1 . packetSize = 1
7 s e n d e r 1 . packetSize_mu = 3567
8
9 # Pareto d i s t r i b u t i o n
10 sender1 . packetSize = 2
11 sender1 . packetSize_shape = 1
12 sender1 . packetSize_scale = 0.1

7.2.3 Storage of Simulation Results


PowerDEVS was extended with a set of new infrastructure classes to configure different backends
to store simulation results.
Simulation results are stored as collections of different variables in the form of time-series, i.e.
a stream of simulated timestamps and an associated values. Time-series can be configured to be
stored in different output formats:

• Scilab workspace: after the simulation ends, all variables become available in the Scilab
workspace as two arrays: one for the time changes and another for the values.

• Comma separated values (CSV) (new): generates a comma separated value file.

• Hierarchical Data Format (HDF) (new): HDF file formats are specifically designed
to store and organize large amounts of data. PowerDEVS generates specifically version 5
(HDF5) [171]. These file formats provide libraries for most programming languages, making
them easily consumable from external programs. Some Python functions are also included
as part of PowerDEVS to open and plot HDF5 generated files.

The backend to be used can be set with a configuration parameter. For example from command
line, the model can be executed as follows to store parameters in a HDF5 file:
1 . / model − t f 7 0 . 0 0 0 0 0 0 −−v a r i a b l e _ l o g g i n g _ b a c k e n d hdf5
It is important to note that the storage backend and the method to read parameters are
independent of each other. I.e, it is possible for example to read parameters from Scilab and store
results in HDF5 files, as well as read parameters from command line and store results in Scilab.
Atomic models are responsible for logging variable values. This is done using the IPowerDEVS-
Logger interface which gets instantiated according to the configuration by the ConfigurationLogger
class or available for atomic models extending BaseSimulator. For example, the packetQueue
atomic model logs the discards_bits variable using the following code:
7.2. New Enhancements for the PowerDEVS Simulation toolkit 177

1 // a t i n i t
2 IPowerDEVS l o g g e r = s t d : : make_shared<C o n f i g u r a t i o n L o g g e r >( t h i s ) ;
3 ...
4
5 // when v a r i a b l e c h a n g e s
6 l o g g e r −>l o g S i g n a l ( t , p a c k e t S i z e , " d i s c a r d _ b i t s " ) ;
The library includes specific atomic models which log every event they receive. For example
the QSSLogger and PacketLogger, for QSS events and network packets, respectively.

Additionally, each individual variable can be configured to be sampled for logging purposes. In
this case, not all event arrivals are logged, but instead the maximum, minimum, averaged value,
sum, count, etc. There are 2 possible options to sample events assuming discrete values or QSS
values: 1) Discrete variable sampling and 2) Continuous variable sampling. They differ in the
way they interpret new events. For example, the discrete sampler logs the rate in Hz, while the
continuous sampler assumes values hold piecewise constant until the next event arrives (e.g. to
calculate the max/min for example).
A usage example is for instance when a packet rate is to be plotted. It is convenient to
sample outgoing packets instead of logging every single packet. This can be done configuring the
packetLogger as follows:
1 sender_packet . s e n t . s i z e _ b i t s . l o g L e v e l =99999999 # t u r n on l o g g i n g
2 sender_packet . s e n t . s i z e _ b i t s . l o g g e r =2 # d i s c r e t e sampler
3 sender_packet . s e n t . s i z e _ b i t s . sample_period =0.1 # s a m p l i n g p e r i o d

Also, a new pcap sampler atomic model receives simulated network packets and generates
standard .pcap output files. These files can be further opened by e.g. the Wireshark packet
analyser. More details can be found in [147].

7.2.4 Documentation and Docker Development Image


New step-by-step examples have been added to the PowerDEVS documentation. The docu-
mentation covers the implementation of new atomic DEVS models as well as instructions for
creating basic network topologies. The new documentation can be found online at: https:
//twiki.cern.ch/twiki/bin/view/Main/HowTosPowerDEVS

Additionally, to reduce the learning curve for new developers, we built a Docker image. The
Docker image contains all the software packages and tools to develop PowerDEVS models, Python
for post-processing, Scilab, Eclipse for writing code, and C++ compilation and debugging tools,
Boost libraries, etc. The Docker image is available online hosted in DockerHub in the project
camisa/powerdevs_devel:latest accessible at: https://hub.docker.com/r/camisa/powerdevs_
devel
The image can be started directly with the command:
1 d o c k e r p u l l camisa / powerdevs_devel

Although, it is recommended to use the script in the PowerDEVS repository to map a project
folder, user settings, etc. Instructions on how to start and use the PowerDEVS Docker image can
be found at https://twiki.cern.ch/twiki/bin/view/Main/PowerDEVSDocker
178 Chapter 7. Appendix

7.3 New PowerDEVS Library of Packet-Level Network Mod-


els
Several DEVS atomic and coupled models for packet-level simulation were added to the base
libraries of PowerDEVS. The most important models were described throughout this Thesis, here
we do a more extensive technical description:

(a) Low Level Models (b) High Level Models

Figure 7.3: PowerDEVS Library of Packet-Level Network Models

7.3.1 RoutingTable (packet-level, fluid-flow and hybrid)


Description: Demultiplexes multiple input fluid-flows into multiple output ports. The output
port to be used for each flow is defined in the configuration file, which maps the pair <nodeName,
flowId> to <outPort>.
This model expects input events containing instances of IFlow, thus can be used both for
packet-level, fluid-flow and hybrid models (see the class diagram in Figure 5.30).

The following code shows an example of configuring the paths for 2 different flows. Each
flow must be specified using the variable FlowIDs. In this example we define 2 flows with
names "sender1" and "sender2" . For each flow, there must be defined the variables <flow-
Name>.routerNames and <flowName>.routePorts, which specify router name and output port
respectively. In this example, when flow "sender1" arrives to the router "router1" it will departure
from outport 0, and when it arrives to "router2 it will departure from outport 1
7.3. New PowerDEVS Library of Packet-Level Network Models 179

1 # define a l l flows
2 FlowIds = { s e n d e r 1 , s e n d e r 2 }
3
4 # Path f o r s e n d e r 1
5 s e n d e r 1 . routeNames = { r o u t e r 1 , r o u t e r 2 }
6 sender_packet . r o u t e P o r t s = { 0 , 1}
7
8 # Path f o r s e n d e r 2
9 s e n d e r 2 . routeNames = { r o u t e r 1 }
10 s e n d e r _ f l u i d . r o u t e P o r t s = {2}

Value/Description
In Ports inPort0..N std:shared_ptr<IFlow>
Out Ports outPort0..M std:shared_ptr<IFlow>
Parameters see description
Header atomics/network/FlowRouter.h

Table 7.2: RoutingTable atomic model (packet-level, fluid-flow and hybrid)

7.3.2 FlowGenerator
Description: Generates packets according to the specified configuration. The configuration allows
to set stochastic distributions for packet sizes and inter-generation times. Also to set a sequence
of time instants at which the packet generation stats and stops. Distributions are configured as
specified in section 7.2.2
It has an additional input port which allows to force the generation of a new packet at the
moment of the event arrival. This can be used to generate packets following an outside source
distribution.
The following code shows an example of configuring the generation of the "sender1" flow. Each
flow generation must be specified using the variable PacketFlowNames. In this example we define
2 flows with names "sender1" and "sender2" . For each flow, the following variables must be
specified:

• <flowName>.period defines the amount of time which elapses between 2 consecutive packet
generations. It is distribution parameter, so it’s supporting parameters should also be speci-
fied. After each packet generation, a random number is taken from the configured distribution
to schedule the time for the next generation.

• <flowName>.packetSize defines the payload size of the generated packet. Total packet size
is determined by the payload size plus the size of headers according to protocol used. It
is distribution parameter, so it’s supporting parameters should also be specified. When a
packet is created, a random number is taken from the configured distribution and set as
payload size.
180 Chapter 7. Appendix

• <flowName>.startStopTimes defines the time instants at which packet generation starts and
stops. It is an order list with t1 , .., tn time instants. Odd time instants are interpreted as the
time to restart generating packets. Even times are interpreted as the time to stop generation
packets.

• (Optinal) <flowName>.typeOfService to be used of Quality of Service (Qos) queues.

1 # define a l l flows
2 PacketFlowNames = { s e n d e r 1 , s e n d e r 2 }
3
4 # sender1
5 s e n d e r 1 . p e r i o d=0 # Constant
6 s e n d e r 1 . p e r i o d _ v a l u e =0.5 # 2 p e r second
7 s e n d e r 1 . p a c k e t S i z e =1 # E x p o n e n t i a l
8 s e n d e r 1 . packetSize_mu =3624 # mean i n b i t s
9 s e n d e r 1 . s t a r t S t o p T i m e s= { 0 , 2 , 5} # s t a r t / s t o p t i m e s
10 s e n d e r 1 . t y p e O f S e r v i c e= 0 # f o r QoS
11 . . .

Value/Description
In Ports InPort0 event value is not used.
Out Ports outPort1 std:shared_ptr<NetworkPacket>
Parameters see description
Header atomics/network/FlowRouter.h

Table 7.4: FlowGenerator atomic model

7.4 New Library of Fluid-Flow Network Models


Several new atomic and coupled models for fluid-flow simulation were added to the base libraries of
PowerDEVS. Atomic models provide base generic calculations of QSS methods. Coupled models
rely on QSS blocks to implement ODE described in the body of the Thesis. Here we do a more
extensive technical description:
7.4. New Library of Fluid-Flow Network Models 181

(a) Low Level Models (b) High Level Models

Figure 7.4: PowerDEVS Library of Fluid-Flow Network Models

7.4.1 QSS Bounded Integrator Coupled Model


Description: detects switching conditions to halt integration and guarantee that the resulting
integrated value remains within a maximum Qmax and minimum Qmin values.
The QBI block interface is the same as in the classic QSS integrator so that both can be used
interchangeably, while QBI accepts 2 additional parameters Qmax and Qmin for the maximum and
minimum values, respectively.
The model implements equations described in Section 5.4.4.

Coupled Implementation

Table 7.5: QSS Bounded Integrator (BQI) coupled model

7.4.2 Reservoir Coupled Model


Description: Represents a reservoir for multiple incoming and outgoing fluids.
182 Chapter 7. Appendix

There are outports with the reservoir total fill Q(t) and delay qDelay(t) that represents the
time since a fluid enters until it departures. Additionally, for each incoming flow, there are related
ports for the departure rate departureRate_ai (t) and drop rate dropsRate_ai (t) for that flow.
Parameters include the maximum allowed size maxSize, the output rate capacity Capacity(C),
and QSS quantums dQmin , dQrel .
The model implements equations described in Section 5.3.

Coupled Implementation

Table 7.6: Reservoir coupled model for 2 flows

7.4.3 Buffer-Server system Coupled Model


Description: Represents a network buffer accepting multiple flows and a server dequeing from
the buffer at a given speed.
It is implemented as a wrapper around the reservoir fluid model to accept events with IFlow
values. The model unwraps flow attributes to feed the reservoir model. Output drop and delay
signals from the reservoir model are added to the original flow attributes and wrapped again in an
IFlow instance.
Parameters are the same as for the reservoir model.
7.5. New Library of Hybrid Network Models 183

Coupled Implementation

Table 7.7: Buffer-server system coupled model

7.5 New Library of Hybrid Network Models


New atomic and coupled models for hybrid simulation were added to the base libraries of Pow-
erDEVS:
184 Chapter 7. Appendix

Figure 7.5: PowerDEVS Library of Hybrid Network Models

7.5.1 Packet2HybridFlow Atomic Model


Description: Receives discrete packets (inPort0). Outputs HybridFlows (outPort0) containing
the packet and a continuous rate (d). The continuous rate d amplitude will be always equal to
the incoming link capacity (Bandwidth C parameter). The signal will be sent for a period of time
equal to packetSize/C.
If the host sends data faster than the link bandwidth, packets are queued in this model (logged
to ’queueSize’ signal). It is better to use a proper PacketQueue.h that can drop, log, etc. This model
acts as a bandwidth delay for packets. Delay should be applied before this model by the discrete
’pacektLinkDelay.h’ or after by a continuous ’qss_delay’. Contrary to the hybridizationLink and
Packet2SmoothFlow, this model sends the packet as soon as it arrives (i.e. WITH_FIRST_BIT
as implemented in hybridizationLink) and LATER waits a bandwidth delay to process next packet.
(packetDiscard) need the packet to be received before the signal.

Value/Description
In Ports inPort0 std:shared_ptr<NetworkPacket>
Out Ports outPort0 std:shared_ptr<HybridFlow>
Parameters Bandwidth C capacity of the link in bps
Header atomics/hybrid_network/Packet2hybridFlow.h

Table 7.9: Packet2HybridFlow atomic model


7.5. New Library of Hybrid Network Models 185

7.5.2 HybridMerge Atomic Model


Description: Applies Fluid-Flow metrics to incoming packets.
For each incoming packet is it affected as follows: 1) Discards packets according to the given
FluidFlow.rate
probability given by FluidFlow.dropRate 2) Applies a delay according to FluidFlow.delay

Value/Description
inPort0 std:shared_ptr<NetworkPacket>
In Ports
inPort1 std:shared_ptr<FluidFlow>
Out Ports outPort0 std:shared_ptr<NetworkPacket>
Parameters
Header atomics/hybrid_network/hybridMergeFluidIntoPacket.h

Table 7.11: HybridMerge atomic model

7.5.3 HybridDemux Atomic Model


Description: Demultiplexes fluid flows detecting the ones that were created by an hybridLink.
Hybridized flows are send by outport0, the rest by outport1

Value/Description
In Ports inPort0 std:shared_ptr<HybridFlow>
outPort0 std:shared_ptr<HybridFlow>
Out Ports
outPort1 std:shared_ptr<HybridFlow>
Parameters
Header atomics/hybrid_network/hybridDemultiplex.h

Table 7.13: HybridDemux atomic model

7.5.4 Hybrid RED Port Coupled Model


Description: Represents an egress port with RED capabilities. Accepts both network and fluid
flows.
186 Chapter 7. Appendix

Coupled Implementation

Table 7.14: Hybrid RED Port coupled model

7.5.5 Hybrid Router Coupled Model


Description: Represents a network router. Accepts both network and fluid flows. Ingress and
egress ports can be added according to the routing table configuration.

Coupled Implementation

Table 7.15: Hybrid Router coupled model with 3 ingress and 2 egress ports

You might also like