Why Data Needs More Attention in Architecture Design

2015 12th
2015Working
IEEEIEEE
12th 12th
IEEE/IFIP
Conference
Conference
on Software
on Software
Architecture
Architecture
Why Data needs more Attention in Architecture Design

Experiences from prototyping a large-scale mobile app ecosystem
Matthias Naab, Susanne Braun, Torsten Lenhart, Ralf Carbon, Felix Kiefer
Steffen Hess, Andreas Eitel, Dominik Magin ETIC
Fraunhofer IESE John Deere
Kaiserslautern, Germany Kaiserslautern, Germany
Abstract—Data is of great importance in computer science and approach, the increasing amount of data, distribution, and
in particular in information systems and how data is treated has interconnection allow new types of systems that require
major impact on a system’s quality attributes. Nevertheless, innovative ideas of handling data. This goes as far as inventing
software architecture research, literature, and practice often new file systems like Google File System [1] and database
neglect data and focus instead on other architectural topics like systems like DynamoDB [2] or Cassandra [3]. Much effort is
components and connectors or the management of architecture spent for achieving a high user experience and a high level of
decisions in general. This paper contributes experiences from the data security and other quality attributes. While these examples
prototyping of a large-scale mobile app ecosystem for the might be perceived somewhat extreme they nevertheless impact
agricultural domain. Architectural drivers like multi-tenancy,
the expectations of users of other software systems. Certainly,
different technical platforms and offline capability led to deep
reasoning about data. In this paper, we describe the architectural
there have been many systems with a high focus on data, e.g. in
decisions made around data in the app ecosystem and we present scientific computing or information systems, before. However,
our lessons learned on technical aspects regarding data, but also our observation in projects with industrial customers is that an
on data modeling and general methodical aspects how to treat data increasing number of systems need more attention for data in
in architecting. We want to share these experiences with the architecture design and many architects in enterprises are not yet
research community to stimulate more research on data in aware of this or do not have the necessary guidance.
software architecture and we want to give practitioners usable
Making the right architecture decisions around data handling
hints for their daily work around data in constructing large
information systems and ecosystems.
becomes increasingly important for all architects. The
availability of mobile devices together with varying
Keywords—software architecture; architecture design; data; connectivity and bandwidth are a major source of challenges for
mobile; app ecosystem; practical experiences architecture decisions around data. Bringing the right data at the
right point in time to the mobile apps has key impact on the
resulting user experience. That might require completely new
data models, APIs and data transport protocols. If even offline
I. INTRODUCTION
capability is needed, also synchronization of data has to be
No one would doubt that data is important in computer handled. The availability of cloud technologies allows for a very
science and in particular in information systems. The way HOW flexible and elastic deployment of systems. How well a system
data is created, transformed, transported, queried, and stored has scales, how cost-efficient data storage is used, and whether the
major impact on the quality attributes of a software system. business logic is appropriately realized by technical consistency
Having the right data at the right place at the right time is models strongly depends on the right architecture decisions. The
essential for a good user experience and a perceived high availability of big data technologies like NoSQL databases,
performance of a system. Additionally, dealing with the data in MapReduce [4] frameworks, or in-memory storages offers many
the right way is key for security. Many technologies exist and more options to deal with data in unprecedented ways to offer
emerge that support transformation, transportation, query, and new features, more performance, and better user experience. The
storage of data. Various architectural decisions around data are trend towards highly interconnected systems, also the
therefore necessary to conceptually organize data and to connection of embedded systems and information systems, e.g.
appropriately leverage technologies. On the other hand, the in the area of car-to-car and car-to-X, further emphasizes the
architecture decisions on data handling to achieve quality need for appropriately dealing with data in architecture design.
attributes often lead to increased complexity and reduced These trends and technologies are only examples that allow for
maintainability, a classical architecture tradeoff. new business models and systems, many more have an impact
on dealing with data.
A. Increasing Importance of Data for Architecture Decisions
Looking at the architectures of today’s internet giants like B. Architecture Research and Literature
Google, Facebook, Netflix, or Amazon reveals that many of their While the state of the practice shows numerous examples of
important architecture decisions deal with data handling. While sophisticated architecture solutions around data, the literature
some years ago many information systems were rather local to a and research on software architecture seems to care less about
company and relied on the classical relational database this. In [5], Gorton and Klein point out the convergence of
978-1-4799-1922-2/15 $31.00 © 2015 IEEE 75

DOI 10.1109/WICSA.2015.13
distribution, data, and deployment and the increasing complexity aspects, but they also lack clear ways of describing data in
and demands for new skills to make appropriate architecture software architecture and stay rather vague about data (e.g. no
decisions in selecting and applying these technologies. This also information on representation of data models, principles of data
supports that more attention to data is necessary. modeling, or data transport granularity).
In classical architecture definitions and books, definitions In the conclusion of the paper we will outline how the
based on elements and their relationships play a major role (e.g. architecture community could strengthen the focus on data and
[6]). However, these elements are typically interpreted like provide direct benefits to practice.
components, modules, or computational nodes, but not in the
sense of data. Another direction of definitions for software C. Goals of this Paper and Contributions
architecture sees architecture as the sum of fundamental The main goal of this paper is to emphasize the importance
decisions (e.g. [7]). The term “architecture decision” is of architecture decisions around data. The scope is mainly
extremely broad and easily allows covering also the decisions information systems with a trend to mobile and cloud
around data as we did in this paper. However, this is also not computing. We want to support architects in practice how to
often seen in literature. Rather, the publications on decisions care approach recurring challenges on a quality attribute level, on a
about architectural knowledge management [8], but not so much technical level and on architecture decisions around data.
about the content of the decisions. Consequently, data also plays Further, we want to stimulate more efforts and discussions in the
a minor role in most of today’s books on software architecture scientific architecture community on making and
and in guidelines on architecture design. documenting architecture decisions around data.
For architecture documentation, view frameworks provide To achieve these goals, we summarize our experiences from
guidance. Unsurprisingly, also the common architecture view the prototyping of a large-scale app ecosystem. This app
frameworks like [9], [10], [11] and also others do not have much ecosystem comes with several of the aspects pointed out in
focus on data but rather on the software parts realizing Section I.A: it has a strong focus on mobile, but it has also
functionality like components or modules. Motivated by aspects of cloud computing, big data, and highly interconnected
industrial needs, further view frameworks introduced and systems. We describe the quality attributes and the detailing of
emphasized the need to represent data like [12], [13], [14], [15], architecturally-significant requirements and how we
or [16]. Strong focus is on the analysis and description of the addressed them with architectural decisions. Additionally, we
data model, but less on the connection to architectural analyzed our approach and our decisions and distilled lessons
components and key decisions on treating the data. The result learned. We contribute these lessons learned grouped along
can be observed in industry: Very seldom, architecture views can different categories like technical aspects, data modeling, and
be seen that explain data aspects beyond basic data flows or data architecting around data in general.
models, if at all. Thus, we included in our ACES-ADF view This paper describes in Section II the mobile app ecosystem
framework data and their connection to components explicitly, for the agricultural domain, which John Deere and Fraunhofer
but the data aspect also needs more elaboration and guidance IESE prototyped. The design and development approach
[17]. More focus on documentation of data is put in Enterprise towards the app ecosystem are outlined in Section III. The key
Architecture Management (EAM), e.g. in the Zachman architectural decisions around data made in the app ecosystem
Framework [18]. and the lessons learned are described in detail in Section IV. We
More attention to data is given in literature on system design conclude with an overall discussion in Section V.
approaches. In particular in the area of object-orientation, there
is by definition the duality of functionality and data. This is also II. THE AGRICULTURAL MOBILE APP ECOSYSTEM
reflected in books like [19], [20], or [21]. In [22], a visual
Farmer 1 Farmer n
representation of a spiral is given that illustrates that
functionality and data are designed in an intertwined manner. «Computing ... «Computing ... «Computing ... «Computing ...
Tablet Phone Tablet Phone
Although these books are highly useful in learning about how to
treat data, the link to the discipline of software architecture is too App 1 App 3 App 1 App 3
weak as these books lack a focus on quality attributes. And

exactly architectural decisions to deal with quality attributes
App 2 App 4 App 2 App 4
often lead to massive changes of simple data and functionality
modeling with many implications on how to treat data.
In particular in the data management scientific community,
there are many publications with a strong focus on data.
Publications on building innovative data technologies impact the «Computing Node»
Backend
possibilities of software architects in building systems, but they
are not really useful in assisting the architect who wants to
adequately use such technologies. Also in other communities
like on mobile computing, publications on aspects of data in «Computing Node» «Computing Node»
architectures can be found. In [23], there is a solution described, Weather Serv ice External Data Prov ider
which is also for the agricultural domain and it involves some

data aspects. Also [24] or [25] provide publications with data
Fig. 1: Architectural Overview of App Ecosystem
76
Mobile devices offer great potentials to farmers as farming IV. EXPERIENCES FROM
work is mostly performed outside. By means of mobile apps ARCHITECTURE DECISIONS AROUND DATA
farmers can leverage IT support wherever they are currently
located. IT also plays a major role in so-called Precision A. Our Solution and Architecture Decisions around Data
Farming. In a large-scale prototyping project performed over the In the following sections, we sketch parts of our architecture
course of 1.5 years, Fraunhofer IESE and John Deere developed solution by providing an excerpt of our architecture decisions
various prototypes of members of a mobile app ecosystem to made around data. The grouping of decisions is similar as in our
support mobile farm management. The ecosystem is architecture document but a bit optimized for the pure focus on
characterized by a strong focus on the agricultural domain and data. The decisions are described in a more condensed way in
an openness that allows future apps to be contributed in order to this paper, not discussing all the details. The main goal is to
increase the overall attractiveness. convey an impression about the multitude and diversity of
architectural decisions around data.
Fig. 1 depicts an architectural overview of the app
ecosystem. Farmers and their operators are the potential users of 1) Decisions on Data Synchronization
the apps. They use native apps on tablet and phone devices (iOS Decision: Support for offline-capability and synchronization
and Android). The system requires the availability of a central Drivers x Availability of service and data even for rural areas and
backend for data management and exchange. The backend addressed: farmland without cellular connectivity
x Usability
works in a multi-tenancy way to allow efficient operation for all Rationale, x Apps will only be accepted by the users if they can be used
customers with their apps on the same backend machines. The explanations in areas with no or only limited cellular connectivity
backend is connected to external data sources, like for example & tradeoffs: x Data assisting farm workers during a job on the field must be
available
a weather data provider. x Data, created by the farm worker during the execution of his
job, like work records and status updates must be persisted
III. DESIGN AND DEVELOPMENT APPROACH on the mobile device. It must be guaranteed that this data can
later be persisted into the backend system of the mobile app
The app ecosystem was developed in a large-scale ecosystem (synchronization).
prototyping project. That is, not a market-ready product was x Tradeoff: Significant complexitiy added to the system
developed but a high-quality prototype. The main goals behind
this procedure were the following: (1) Explore technologies and Implementing and designing sound synchronization for
architectural solutions in detail and gather experiences for individual software applications is one of the most challenging
potential future system development. (2) Bring the apps to real tasks that software engineers and software architects can face.
users and get early user feedback and adjust the requirements As it is such a complex task, why offer support for offline
and concepts accordingly. (3) Increase development speed for capability and data synchronization at all? Still, mobile networks
fast learning, which is done by a completely parallel team, which are massively expanded and will become even more wide-spread
is not impacted by existing systems and can fully concentrate on in the next years. The main reason behind our decision was that
the app ecosystem. we wanted to provide outstanding usability. Our users will be
mainly operating in rural areas where experts expect to have
The development approach was highly iterative and flexible. many spots with no or only limited cellular connectivity. If our
The business stakeholders at John Deere were closely involved apps could only be used with network coverage, this would be
in the development. As soon as a working version with a disastrous for the usability since farm workers are out on the
meaningful feature set was available, it was provided to the first field for most of their working time. The two key requirements
real customer, a farmer, who used it and gave first feedback. for the synchronization mechanism are therefore: (1) Data
A strong focus of learning was on User Experience (UX) and assisting the farm worker during his job on the field must be
on an appropriate architecture. Thus, although a prototype was available on the mobile device at any time. (2) Data created by
developed, a strong focus was on high quality, in particular the farm worker during the execution of his job like work records
concerning all relevant quality attributes. This is different to and status updates must be persisted on the mobile device. It
many other prototyping development projects. The only must be guaranteed that this data can later be persisted into the
tradeoffs made with respect to quality attributes were reduced backend system of the mobile app ecosystem (synchronization).
scalability and less investment into reliability and stability. Decision: Design and implementation of a custom data
synchronization mechanism
The system architecture was defined based on the Drivers x Dependencies between synchronization, business logic and
Architecture-centric Engineering Solutions (ACES) approach of addressed: usability
Fraunhofer IESE [12]. A strong focus was on a detailed x Fine-grained control over synchronization mechanism
elicitation of architectural drivers from all relevant stakeholders. Rationale, x Performance and scalability of the overall sytem depend
explanations heavily on the correct setup of the database replication and
The architectural drivers were specified as architectural & tradeoffs: synchronization mechanism
scenarios, covering many quality attributes and resulting in more x Uncertainty whether real-world use cases of customer could
than 60 scenarios. The architecture was designed incrementally be realized with third-party products
x Existing out-of-the-box solutions of database system vendors
in parallel to the development project. Parts of the architecture seem easy to use at first glance, but the complexity of the
are described in the next section. The documentation of the overall topic is still there and must be adressed appropriately
architecture was done with a mix of architecture decisions and by the development team
x Existing out-of-the-box solutions typically hide their internal
architectural views according to ACES. mechanisms and thus it is hard to estimate how they scale
x Existing out-of-the box solutions typically provide only
eventual consistency [26], which offers a quite low isolation
77
level and often requires manual handling of conflicts Currently, synchronization is realized with two HTTP
nevertheless roundtrips: one for the PUT operation and another one for the
x Requirements engineers, usability experts and software
architects have to closely work together on this task. That GET operation. Although PUT and GET are realized
way you can find a solution that serves your specific independently and seem to be loosely coupled, we discovered
application in an optimal way, while at the same time it allows two relatively strong limitations:
you to ignore all the peculiarities and cases that need not be
considered for your special domain, exploiting that you do
not have to solve database replication and synchronization for x Clients having dirty entities within their replicas must
the general case perform PUT before they can perform a GET operation.
x Tradeoff: Complex, costly and challenging to design and Otherwise the clients could accidently overwrite their
implement
own not yet synchronized changes or would have to
Decision: Database entities are extended by a revision number implement conflict detection and resolution as well.
and a synchronization status x Clients must immediately perform a GET operation after
Drivers x Differential loading of updated entities the PUT operation. Otherwise, clients do not get the
addressed: x Client-side tracking of update operations updated revision numbers of the entities that have been
Rationale, x In order to enable clients to load only entities that have been updated with their PUT operation. If the user would then
explanations updated since their last synchronization we assign revision
& tradeoffs: numbers to entities. perform consecutive updates on the very same entities
x In order to track update operations on the client we add one the client would conflict with its own previous updates.
additional attribute that holds the update status of the entity
since the last synchronization operation. In consequence those two operations cannot be considered as
x Current update statuses are: INSYNC (no update since last
synchronization), UPDATED, CREATED and DELETED
independent and therefore the decision to separate them might
x Tradeoff: Extension of database entities with additional be revised.
technical attributes, Complete extension of all database
entities might not be possible in existing applications Decision: Sync GET and Sync PUT as independent operations
Drivers x GET and PUT are loosely coupled and can be performed
addressed: independently
In order to implement differential loading of updated entities
Rationale, x Clients can perform GET independent of PUT and the other
and conflict detection the entities of our persistence layer are explanations way round
extended with a revision number (similar to Hibernate’s [27] & tradeoffs: x Clients can load updates of concurrent users but need to
version number). If an entity is updated during a database transmit local changes
x Tradeoff: Two HTTP round trips instead of one. Design
transaction on the backend the revision number of the entity is weakness: with the current approach GET and PUT can not
increased. The mobile clients do not make any changes to the be considered as independent.
revision number but they have to store it locally with the entities
as well. If a mobile client is updating an entity in its local replica Decision: Data is not physically deleted in the backend
Drivers x Differential load of changes
database this entity is marked as being dirty (status UPDATED). x Offline-capability
addressed:
When the client finally wants to synchronize its changes with Rationale, x Entities that have been deleted on the client are not
explanations physically deleted in the backend database but marked as
the master database the following steps are performed: being deleted by setting the synchronization status to
& tradeoffs:
DELETED
x The client collects all entities that have been marked as x Also the revision number is increased
dirty and sends them in one HTTP PUT request to the x That way, entity delete operations can be delievered to our
backend (client delta) clients with the standard synchronization algorithm
x Tradeoffs: Data can not be physically deleted which is
x The backend starts a new database transaction conflicting with data privacy rules of many German
x The backend preprocesses the client delta and checks if companies; Growth of database table sizes
entities of the client delta are conflicting with other 2) Decisions on App Data Handling
concurrent modifications (conflict detection) Decision: Use CoreData as iOS client persistence technology
x If the backend detects conflicts it will merge the state of Drivers x CRUD operations at object-level
the conflicting entities with their current database state addressed: x Easier data access
(conflict resolution) x Automatic validation of object properties (mandatory fields,
to-many relationships, default values, …)
x The backend assigns the revision number of the current Rationale, x Mainly helps in auxiliary facets of an application
backend database transaction to all updated and merged explanations x Management of save & undo functionalities
entities & tradeoffs: x Tracking of changes
x Easy way to access data and fill UI
x The backend persists the updated and merged entities x Creation of data model through a graphical object model
into the backend database editor, allowing lists of objects, one-to-many or many-to-
x The backend database transaction is committed and an many relationships, or constraints on object attributes
x Easy change and migration of datamodels
HTTP OK is sent to the client x Technical persistance: SQLite database
x The client sends a HTTP GET request with the largest x Tradeoff: less influence on SQLite level, as CoreData is an
revision number it has ever seen abstraction layer. Besides that, we have slightly different
data models and source code between the platforms.
x In a transaction, the backend collects all entities with a
revision number greater than the one sent by the client Selecting CoreData as the persistence framework still leaves
and sends it back to the client (backend delta) many architectural decisions open. One is how to utilize the
x The client applies the updates of the backend delta to its concept of managed object contexts. We illustrate different
local replica database in the context of a local transaction alternatives of using managed object contexts, as we first went
78
for one alternative and then decided later to go for the other one is used in the same way as in the previous concept whereas the
and changed the implementation. background context is either responsible for a sync or for CRUD
operations.
A managed object context represents a single “object space”
or scratchpad in an application and holds a “copy” of an object Decision: Use of parallel managed object contexts
in the SQLite store. CRUD operations are performed on a Drivers x Performance issues during a save
managed object context. The initial solution for the sync concept addressed:
Rationale, x Hierarchical order of contexts slowed down saving to the
used a hierarchical order of contexts (see Fig. 2 a). explanations database tremendously
& tradeoffs: x Parallel contexts allow a much better performance, as the UI
context is not beeing influenced by a save of a background
«Data» «Data»
WorkerCtx 1 WorkerCtx n context
x Tradeoff: Only one background context available, which
reduces flexibility of dealing with data changes at the same
time
«Data»
UICtx
3) Decisions on Backend Data Handling
child Decision: Realize multi-tenancy with discriminator ID for all
multi-tenancy-related entities
«Data» «Data» «Data» Drivers x Multi-tenancy capability of the backend: Has to serve apps of
RootCtx UICtx BackgroundCtx
addressed: multiple customer organizations, all of them can have many
users
data
data
access access
x Clear separation of data of multiple organizations. No one is
allowed to see data of other organizations
«Component» «Component» x Multi-tenancy should support cost-efficient operation of the
SQLite DB SQLite DB ecosystem and in particular of the backend for John Deere
Rationale, x Each customer organization (mainly farming companies) is
explanations treated as an individual tenant
& tradeoffs: x The data of all tenants is stored in the same database and in
Fig. 2: (a – left) Hierarchical order of managed object contexts | the same tables. This makes the management of databases
easier and more cost-efficient than introducing new
(b – right) Parallel managed object contexts databases for new customers
x Each customer organization is identified in the system with
The root context has direct access (through its persistent an organizationID. This organizationID is added as an
store coordinator) to the SQLite store. It runs in a background attribute to each entity that has multi-tenancy-related data
(e.g. the fields of a farm belong always to a certain farm)
thread enabling CRUD operations having less impact on the
performance of the UI. The UI context runs in the main thread Decision: Allow mixture of provider-created data and tenant-
and is responsible for providing data to the UI. As it is a child created data
context of the root context, it fetches its data from the root Drivers x Not all data is purely tenant-specific. There is also data that
addressed: has to be provided by John Deere to the customers (e.g. list
context. When there is the need for synchronization to the of crops). This covers the most important data for all
backend or for CRUD operations, a worker context is requested. customers.
Each worker context is a child context of the UI context enabling x However, the possibility is needed that a customer can
extend such data for his own purpose only. Other
it operate separately from the UI context (e.g. CRUD operations organizations must not be affected by this.
are only visible to the UI context if a save of the worker context Rationale, x This decision is based on the previous decision how to
has been performed). To persist data from one worker context to explanations realize multi-tenancy
the database, three saves have to be performed (from worker to & tradeoffs: x Data that is provider-created is marked with organizationId =
null, while tenant-created data is marked with the respective
UI, from UI to root, and from root to the database). Due to the organizationId of the tenant
unnecessary complexity and maintainability problems, the x All data marked with organizationId = null is read-only for
decision was discarded and the decision to use parallel managed the clients and accessible by all clients
object contexts was taken, resulting in a much simpler design. 4) Decisions on Data Modeling
Decision: Use of a hierarchical order of managed object contexts Decision: Keep data models on apps and backend conceptually as
(discarded later and replaced by parallel contexts) similar as possible
Drivers x Clear separation of operations (displaying of data, sync of Drivers x Maintainability: easy extension of data model with new
data, and CRUD operations on data) addressed: entities and attributes
addressed:
x Highest priority for UI Rationale, x Highly similar data models are easier to understand
Rationale, x Persisting data from root in the database runs on a explanations x Less modeling effort (only single model), clearer
explanations background thread resulting in the UI to not being blocked & tradeoffs: communication
& tradeoffs: by this operation x Highly similar data models allow for highly automated
x UI has up-to-date data, as it is a mediator between serialization / deserialization of data
sync/CRUD operations and the database/root. x Slight differences necessary for technical attributes (e.g.
x Data from worker contexts is not visible to the UI, only after annotating the organizationId to realize the multi-tenancy
a successful sync to the backend or save of a CRUD concept)
operation x Tradeoff: higher coupling of data models on app and
x Tradeoff: Lower understandability and maintainability by backend is less robust and introduces additional challenges
having multiple parallel Worker Contexts in parallel at the for evolution
same time
Decision: Different physical representations of conceptual data
Now, two contexts are used in parallel to access the SQLite model for different purposes and in different layers
store (see Fig. 2 b). The UI context runs on the main thread and Drivers x Deal with technical differences like object-orientation and
addressed: relational model
the background context on a background thread. The UI context
79
x Comply with the enterprise rule to use REST-based APIs Decision: Weather data is fetched on the backend and in turn is
with JSON data format distributed to the relevant clients
x Deal with the specifics of technologies like CoreData on iOS Drivers x Weather data shall be available for all fields and customer
Rationale, x See Fig. 3 for the different technical representations and addressed: organizations
explanations where they are used: Relational tables used in the MySQL x The number of API calls to the weather provider should be
& tradeoffs: database, Java Entities for the business logic (Manager), Java minimized as API calls cost money
DTOs for the API Services, supporting multiple versions of Rationale, x Fetching the data directly from the clients would have
the services at the same time for evolotion scenarios, JSON resulted in redundant API calls since several clients would
DTOs for the transport between app/backend,, CoreData
explanations
& tradeoffs: have requested weather data for the same fields
objects in the iOS apps, Java Entities in the Android apps x With this approach, there is no provider-specific code on the
x Further technical differences exist: CoreData always clients
maintains bi-directional dependencies while the other models x Weather data can be stored on the backend for later analysis
are uni-directional, Java Entities and Relational Tables
x Tradeoffs: Some minor delay in the availability of data
realize relationship directions differently due to the nature of
because of the additional communication step (client ↔
the relational model
Weather Data API vs client ↔ backend ↔ Weather Data
x Tradeoff: While the technical representations are necessary API)
due to technical constraints and other architectural drivers,
they strongly decrease maintainability as each model change
has to be reflected in all technical representations Decision: Weather data is retrieved for geographical areas
instead of individual fields
Drivers x Weather data shall be available for all fields and customer
«Computing Node» «Computing Node» addressed: organizations
x The number of API calls to the weather provider should be
iOS Dev ice Android Dev ice
«Data» «Component» «Component»

«Data» minimized as API calls cost money
iOS App Android App
x Weather data typically comes with a granularity on city/zip
CoreData Jav a Entities
Obj ects Rationale,
explanations code level, i.e. fields that are located nearby a specific city
& tradeoffs: will get an identical forecast
uses
«Data» x The system therefore first maps fields to geographical areas
JSON DTOs
based on the closest city
x This mapping is achieved through passing the GPS
«Computing Node»
Backend coordinates of the field to a reverse geocoding service.
«Component»
Typically all Weather Data APIs offer such a service, but
Serv ice there are also free alternatives
«Data» data model
x For all fields that are located in the same area, the weather
Jav a DTOs usage data is requested only once with a single API call, no matter
«Component»
Manager
if these fields belong to the same or a different organization.
x This approach enables us to not only reuse weather data
within an organization, but also across all tenants of the
«Component»
system. This approach highly benefits from the multi-
Data Access Obj ect tenancy capability of the backend, as it can strongly reduce
«Data» cost for weather data for customer organizations
Jav a Entities
x Tradeoffs: Additional calls to the reverse geocoding service
«Component»
EntityManager
have to be executed for every field
6) Decicions on Security
Decision: Data access authorization is completely handled by the
backend
«Data» «Computing Node»
Relational Tables Database Serv er (MySQL) Drivers x Simple client implementation regarding data access rules
addressed: x Maintainability: simple role management
Rationale, x Whenever data (or a subset of it) is delivered to an app, the
Fig. 3: Physical data representations of the conceptual model backend checks if it is allowed to be delivered. Authorization
explanations
& tradeoffs: is granted upon many attributes, such as user, role and
Decision: Modeling of data in different domains with top-level assigned company.
aggregates and access restrictions [20] x The main advantages are that the user is allowed to read and
Drivers x Better understandability of individual parts of the data model also modify every data which leaves the backend. Special
x Maintainability of the data model: independent changes in “read-only” data can be tagged to inform the client about this
addressed: constraint.
the domains
Rationale, x The data model is separated in domains like persons,
explanations (agricultural) fields, equipment, products, jobs, …, which Decision: Encrypt communication and critical data storage
& tradeoffs: can have multiple entities Drivers x Ensure confidentiality and integrity of data transfers
x Across domains, only references to the top-level entities of a addressed: x Critical data, such as login credentials must be stored
domain are allowed (to the aggregates), e.g. to a field encrypted
x This way of modeling typically implies a modelling direction Rationale, x Data transfers must be confidential due to the sensitivity of
from the top-level entities to further detailed entities explanations data that is exchanged.
x Tradeoff: It turned out, that some of the detailed entities & tradeoffs: x The client as well as the backend have to store all credentials
have much higher creation/change frequencies than the top- encrypted. On iOS client side, for example, this is done by
level entities (e.g. the position of a machine changes much using the systems keychain.
more often than the machine data itself). This led to x Tradeoff: In case an iOS app gets deleted, the credentials
problems with the synchronization mechanism as adding a which have been stored into the keychain will not be deleted
new detailed entitiy means to update the top-level entity, and therefore are automatically used the next time, when a
which increases the likelihood of conflicts ecosystem app is installed. To avoid this behavior, the user
must sign out before deleting the app.
5) Decisions on Data Import / Export x Another important aspect are backups of the entire client:
This section focuses on the integration of a weather service. they must not contain any OAuth credentials. On the iOS
clients, for example, this is solved by instructing the
keychain to keep the data on the device only.
80
Decision: Sharing OAuth Access Token via iOS keychain group
«Computing Node»
access for Single-Sign-On Mobile Dev ice
Drivers x Single-Sign-On mechanism for all apps of the ecosystem
addressed: x Usability: faster and comfortable app starts «Component»
High Frequency Data App
Rationale, x Enabling the apps of the ecosystem to perform a Single-Sign-
explanations On requires the use of just one client token and secret. Using
& tradeoffs: a different client token for every app would mean the
execution of the login procedure for every app.
x Depending on the platform used, there are multiple options
to share data between apps. On iOS clients for example, the «Computing Node»
iOS keychain with its group access concept is perfect for this «Data» Backend «Data»
StatusObj ect StatusUpdateEv ent
purpose.
x It is important to check every time the client starts or
uses
resumes, that the existing data belongs to the credentials
stored in the iOS keychain and that the access token hasn’t
changed. Thus an unique identifier is generated after every «Component» «Component»
Read Serv ice Write Serv ice
login. This one is stored locally within the client application
and within, e.g. the iOS keychain. If that identifier isn’t equal
when a client starts or resumes, all data must be deleted and
– if new login data is available – fetched again.
«Data» «Data»
Read Model StatusUpdateEv ent
The decisions on security show that authorization of data (preprocessed Queue
access strongly depends on how mobile devices are intended to StatusObj ects)
data model
be used (multiple users of same device, multiple apps with usage
single-sign-on). Additionally, they show that a very good «Component»
understanding of the relevant mobile frameworks is necessary to StatusUpdateEv entProcessor
be able to design adequate solutions (e.g. usage of keychain

group access to share credentials among isolated apps).
7) Decisions on High Performance «Computing Node»
Database Serv er (Cassandra)
Decision: Treat data with high change frequency (~ 1 update /
second) with CQRS and Cassandra database (Fig. 4) «Data»
StatusUpdateEv ent
Drivers x At a quite late point in time, support for a new type of apps
addressed: became necessary: An app that can in near-real-time show
the positions and other critical information about machines.
The app is connected to a machine and can thus send the data
and display the data of other machines working in the same
Fig. 4: CQRS pattern & Cassandra for high-frequency data
team
x A manager wants to get an overview of all his machines with Decision: Caching of entity references during data import
the critical data points Drivers x Bad entity lookup performance
x The desired update-frequency is, depending on the data addressed:
elements, between 1 second and a few minutes Rationale, x When importing a new object into the client database, we
Rationale, x This type of data is completely different compared to the explanations made two fetches: one to make sure, that this object isn’t
explanations master data and the transactional data treated in the system already in our database and one to get the object to set its
before: The key differences are the change frequency, the & tradeoffs:
& tradeoffs: relationships (after the create/update phase). This works
data is written and never updated, and new data elements quite well for small datasets but didn’t scale for large
always replace old ones (the new position of a machine imports.
replaces the old position) x Optimizing this is possible by limiting the lookups so that
x Our concept of data exchange with the sync mechanism is not more than one entity gets fetched.
not usable at all for these requirements as sync takes too long x In addition, we’re caching the references to new and fetched
and introduces way too much overhead for this entities in a hashmap with their identifier as the key, so that
communication. every further lookup is much faster.
x Thus, we introduced a new data connector between the
backend and the apps for this new data. The apps just write
bundled new data in the form of StatusUpdateEvents, and Decision: Differentiate between initial and delta synchronization
they can read optimized StatusUpdateObjects, which bundle Drivers x Bad data import performance
all the relevant data for them addressed:
x In the backend, we used the CQRS pattern (Command Query Rationale, x During data import, we first lookup each object in the
Responsibility Segregation) [29]. By that, reading and explanations database before updating or creating it. Unfortunatly, this is a
writing can be independently scaled. Writing simply puts the & tradeoffs: mandatory step to avoid duplicates. Our measurement
StatusUpdateEvents into the Cassandra database and writes reveals that this is a time-consuming step.
into a queue. Then, an EventProcessor reads the x In addition to the caching decision explained above, we
StatusUpdateEvents and incorporates them into the read differentiate between the initial (no data exists in the
model. database) and delta (data already exists at least partly)
x The read model is prepared for the current data requestors synchronization case. In case of the initial synchronization
and kept in-memory for high-performance delivery of data the lookup for existing data can be completely skipped. With
x Cassandra is used as it provides way higher write these two measures, we significantly reduced the time of
performance than an traditional relational database and better theinitial data import part from 105 seconds to just 1.7
scalability. The data stored is mainly for historical analyses seconds when importing a test dataset.
but not needed for normal system operation.
x Tradeoff: Maintaining different databases and database types
leads to increased learning, maintenance, and operation
effort.
81
Decision: Forego inheritance in the core data model Offline synchronization is really costly to develop. Our worst
Drivers x Big database files under-estimation of effort was for the synchronization solution,
addressed: x Bad save performance simply because we didn’t know enough about it and had to learn
Rationale, x The second time consuming task was persisting the core data
model in memory to the database. In our data model we used
it the hard way. Existing out-of-the-box solutions like
explanations
& tradeoffs: inheritance to propagate some properties, such as a identifier, Couchbase seemed to be too risky and we went for a solution
to almost every entity. developed from scratch. However, after all we learned a lot
x When using inheritance, core data creates a wide and tall
table to map this feature to its abstract data model. As a fact,
about solution options and can transfer them now to product
this table could cause a performance penalty and leads to development.
many NULL values in it. Besides that, it blows up the
database file in size. Data management frameworks like CoreData require in-
x Avoiding inheritance by adding the properties of the super depth knowledge and deliberate architecture decisions.
entity to all other entities leads to a smaller database file and
a much faster core data performance. With all these Many people still think for the internals of an app there is no or
decisions, the overall synchronization time has been reduced not much architecture necessary. Just looking at how to represent
from about three minutes to less than a minute. data with a framework like CoreData shows us the opposite.
Without a clear architectural understanding of the internals of
B. More Architecutural Drivers with Impact on Data the relevant technologies and clear architecture decisions on how
to use these, many hard to debug problems will occur.
In addition to the decisions we described in the previous
section, we also had to deal with various other architectural 2) Lessons Learned on Quality Attributes
drivers that are directly or indirectly related to data. These are High performance and great user experience needs many
briefly summarized below. architecture decisions to bring the right data to the right
We prototyped our apps for Android and iOS devices. While place at the right point-in-time. These architecture decisions
iOS is a closed ecosystem with relatively strict rules and policies, deal with data partitioning, prefetching, selective loading,
Android is rather open, giving architects and developers caching, paging, etc. Such architecture decisions typically lead
typically many alternative ways to realize a solution. Especially to an increased technical complexity and make implementation
regarding data model and persistence we had to take several tasks harder as they cannot be completely hidden behind a
decisions to define an overall architecture that on the one hand technical framework.
is as consistent as possible across all components, but on the Security concepts like fine-grained access control of roles to
other hand also considers the specifics and strengths of each certain entities or even objects can be the cause of hard to
particular platform. find bugs. Bugs like incorrect access control lists lead to the
Our data model and services had to support versioning. This delivery of data sets which are not consistent any more since
was especially important because the system is supposed to be some objects are filtered out but still referenced. In particular, in
continuously extended over time, but also has to be able to combination with the offline synchronization concept this leads
support older versions of the apps that are on customer devices to hard to find bugs. A source of these bugs is that the evolution
installed since it is typically not possible to force mobile users to of the data model and the physical data structures for the security
upgrade their apps instantly. information have to be always consistent.
Another crucial requirement was the integration of a push Upgradeability introduces additional technical complexity
notification feature. Specific users have to be proactively of data handling. There is no possibility to upgrade all mobile
notified when specific data has been created, updated or deleted devices and the backend at the same time. That means that the
by another user of the system. The main challenge here was to backend has to support at least two versions of the apps at the
design a module that intercepts the relevant data changes and same time. When the data model was changed, this means that
that also takes the decision when and to whom notifications different APIs are needed and somehow have to be mapped to
should be send. the same databases. Such a redundancy of APIs with different
versions is challenging as in the end the data of old and new app
Further sources of architectural challenges around data are versions has to be treated in the same way in the same backend,
internationalization, media data and data validation. which can require additional physical data representations and
C. Collected Experiences and Lessons Learned transformations.
The following lessons learned originate from our design and Nearly all architecture decisions around data in favor of
implementation work in the prototyping of the described app other quality attributes have adverse impact on
ecosystem. We generalized them in the light of other experiences maintainability. Improving quality attributes like performance,
and we are confident that they can support researchers and security, or upgradeability nearly always introduces new
practitioners in many other contexts. architectural concepts and often new technical representations of
data. The consequence is that the system becomes technically
1) Lessons Learned on Technical Aspects more complex and this technical complexity often interferes
Offline synchronization does not come out of the box. with the business complexity as it cannot be fully separated. This
Although there is numerous database replication technology on the one hand makes the system harder to understand and on
around, this is often not enough to come up with a good offline the other hand changes come with more impact. Finally, also
user experience. Thus, offline capability has to be treated in testing becomes more difficult and thus the overall
close collaboration of UX designers and software architects to maintainability is reduced.
come up with a solution that fits the needs of the system at hand.
82
3) Lessons Learned on Data Modeling way, otherwise this will lead to inconsistencies and deadlocks
Typically, there is not a single physical data model. Different that can hardly be resolved.
technologies require a technically adjusted representation of data Sometimes data redundancy should be accepted when this
and the conceptual data model has to be mapped to the physical reduces complexity of the data model. When designing data
representations (e.g. OO, relational, …). In the architecture, it models, you typically try to avoid redundancy of data. However,
should be very clearly described how the physical under specific circumstances minor data redundancy should be
representations differ and which types of components / layers are tolerated when this leads to a leaner data model or allows
supposed to work with which of the physical representations. improving performance.
JSON does not support natural serialization / deserialization 4) Lessons Learned on Data in Architecting
of polymorphic objects. JSON has become the standard way for
hybrid systems to communicate with each other for good Guidelines on data modeling are not well integrated in
reasons. It has several advantages over potential alternatives like architecture design methods / community. Although many
XML, including readability and simplicity. However, when architectural decisions around data and data modeling have to be
exchanging complex data objects between higher programming made, they seem to be a disconnected field from the perspective
languages you will realize that JSON does not support of software architecture. Thus, architects in practice have to be
serialization/deserialization of polymorphic objects out of the trained in both areas and have to draw conclusions for their
box. There are several ways to circumvent this problem (we architectural decisions. The architecture community should
decided to introduce extension classes), but each comes with provide more guidance on treating data and refer to appropriate
some tradeoffs that have to be closely examined. data modeling approaches and put them into architectural
context (elaborating in particular the impacts on quality
Change frequency of data can impact directions of attributes).
relationship. In order to avoid frequent updates of stable data
due to referenced objects, there should be a clear analysis and Documentation of architecture decisions around data needs
categorization of the change frequency and the life-cycle of data. more support in form of views and guidelines. In most of
As a rule of thumb, data with higher change frequency should today’s view frameworks for architecture documentation, data
relate to master data. However, turning the directions of does not play an important role or only partial aspects are
relationships around often comes with changed ways of addressed. The focus is rather on the decomposition of the
accessing data, which often seems to be less intuitive. system into components / modules. However, we found it very
helpful for communication to also visually represent aspects
Simply following guidelines of OOA / OOD [19] or DDD [20] around data, e.g. where data is created, where it is transported,
often does not lead to adequate architectures and data where it is stored, and where which transformation between
models. OOA / OOD / DDD do not discuss the impact of data- technical representations is done. Further visual representation
related decisions on quality attributes in enough detail. In is necessary for higher-level grouping of concepts like domains
particular combined with the observation that there are several and the ownership of data in components.
physical representations of the data model, the combination of
data and logic as proposed by OO makes only sense in a kind of Teaching of software design at universities has often no
domain / business logic layer. architectural perspective, in particular when it comes to
data. Software design is often taught with OOA / OOD methods.
Classifications of data like the one in “master data | On the other hand, in database-related lectures, entity-
transaction data | reference data” [28] can support data relationship modeling and relational models are in the focus. The
modeling but often leave questions open. In particular the real importance of architecture decisions around data is mostly
distinction between master data and transactional data supports neglected and with the emergence of new and more diverse
the analysis of change frequency and provides modeling technologies like NoSQL databases it is obvious that universities
guidelines like: transactional data refers to master data and not have to put more focus on data.
vice versa. However, data classifications are often not
conceptually unambiguous. Additionally, different data Be aware of overengineering: Do not design the system more
classifications exist, which are not matching and also not complex than necessary. There is no doubt that engineering
orthogonal. Thus, it is an architect’s task to select an appropriate principles and best practices such as design patterns and
classification and to conclude what this means for modeling. abstraction mechanisms are of great value and essential for
developing software systems in good quality. However, we have
The reaction of the system to the deletion of master data has to be careful that we do not apply them for their own sake, but
to be clearly defined and consistently realized across all only when they really fit in the concrete situation. Aiming for
components of the system. Master data can be seen as the glue simplicity is a worth-while goal that is not easily achieved.
that binds the other types of data together. Nevertheless, from However, in the long-run it will pay off by improved
time to time it might be necessary to delete portions of master maintainability.
data because they are no longer relevant. In these cases it is
important to decide what should happen to other data that has V. CONCLUSIONS
references to this master data. This might depend on the state of In this paper, we elaborated the need for more attention to
the concerned entity or the importance of the respective master data in software architecture design, and in particular in the
data. However, for distributed systems like our prototypes, it is architecture community. In order to underline the importance of
crucial that all components treat deleted master data in the same data, we contributed diverse conceptual and technological
83
architectural solutions around data from a large-scale app [2] G. DeCandia et al. Dynamo: Amazon’s Highly Available Key-Value
ecosystem prototyping project. Additionally, we summarized Store. ACM SIGOPS Operating Systems Review, Volume 41 Issue 6,
December 2007.
lessons learned with technical aspects and with respect to data
[3] A. Lakshman, P. Malik. Cassandra: a Decentralized Structured Storage
modeling. System. ACM SIGOPS Operating Systems Review, Volume 44 Issue 2,
The software architecture research community should April 2010.
invest into efforts to give data a more prominent role in software [4] J. Dean, S. Ghemawat. MapReduce: Simplified Data Processing on
Large Clusters. Sixth Symposium on Operating System Design and
architecture research and thus to be of higher value for Implementation, 2004.
practitioners. Architecture documentation should provide more [5] I. Gorton, J. Klein. Distribution, Data, Deployment: Software
standardized ways how to represent the most important aspects Architecture Convergence in Big Data Systems. SEI,
around data. In architecture design, there should be more https://resources.sei.cmu.edu/asset_files/WhitePaper/2014_019_001_909
guidance provided which architectural decisions around data 15.pdf, 2014.
have to be made and how certain quality attributes and scenarios [6] L. Bass, P. Clements, R. Kazman. Software Architecture in Practice, 3rd
can be achieved. Data modeling guidelines, also from other ed. Addison Wesley, 2012.
communities, should be integrated into architecture design in [7] R.N. Tayler, N. Medvidovic, E.M. Dashofy. Software Architecture:
Foundations, Theory, and Practice. Wiley, 2009.
order to give practitioners more guidance. Many great and
beneficial paradigms, architectural concepts and realizing [8] M. Ali Babar, T. Dingsoyr, P. Lago, H. van Vliet. Software Architecture
Knowledge Management. Springer, 2009.
technologies around data handling exist, from other research
[9] P. Kruchten. 4+1 View Model of Software Architecture. IEEE Software
communities, from open source communities, and from 12 (6), November 1995.
industry. The architecture research community could contribute [10] P. Clements et al.. Documenting Software Architectures: Views and
to make these more accessible to architects by better extracting Beyond, 2nd ed.. Addison Wesley, 2010.
and documenting the architectural principles and impacts. A [11] C. Hofmeister, R. Nord, D. Soni. Applied Software Architecture.
great example for this is [5] and the tutorials of these authors. It Addison Wesley, 1999.
would be also beneficial to publish examples of architecture with [12] J. Garland, R. Anthony. Large-Scale Software Architecture: A Practical
a strong focus on data and their appropriate documentation. We Guide Using UML. John Wiley & Sons, 2003.
tried to do a first step by sharing a real and large-scale example, [13] N. Rozanski, E. Woods. Software Systems Architecture: Working With
from which we also discovered the missing attention to data in Stakeholders Using Viewpoints and Perspectives. Addison-Wesley
architecture design. Finally, the architecture research Professional, 2005.
community could move more closely to other communities, like [14] ISO Reference Model Open Distributed Computed (RM-ODP).
http://rm-odp.wikispaces.com/ODP+Viewpoints
the data management community. Solutions for challenges like
data synchronization are so complex that they need contributions [15] The Open Group. TOGAF 9.1 Online. 2015.
http://www.opengroup.org/subjectareas/enterprise/togaf
from different communities.
[16] P. Merson. Data Model as an Architectural View. CMU/SEI-2009-TN-
Software architecture education should put more focus on 024, 2009.
data in software architecture, too. The importance of data has to [17] T. Keuler, J. Knodel, M. Naab. Fraunhofer ACES: Architecture-Centric
be motivated and the wide impact of architectural decisions Engineering Solutions. Fraunhofer IESE, IESE-Report, 079.11/E, 2011.
around data has to be practically shown. The connection of data [18] J. Zachman. Zachman Framework. http://www.zachman.com
in OOA/OOD or DDD and in software architecture has to be [19] C. Larman. Applying UML and Patterns: An Introduction to Object-
Oriented Analysis and Design and Iterative Development, 3rd ed,
made clearer. Larger-scale examples of well-elaborated and Prentice Hall, 2004.
documented architecture decisions around data have to be
[20] E. Evans. Domain-Driven Design: Tackling Complexity in the Heart of
presented. However, it has to be acknowledged that teaching Software. Addison Wesley, 2003.
software architecture is in general difficult due to limited [21] M. Fowler. Patterns of Enterprise Application Architecture. Addison
experience of the students with large-scale systems. Wesley, 2002.
Software architecture practitioners, if they haven’t [22] C. Atkinson et al. Component-Based Product Line Engineering with
UML. Addison Wesley, 2001.
already done so in the past, should put more focus on data in the
[23] R.K. Lomotey, R. Deters. Management of Mobile Data in a Crop Field,
design of software systems. Not only in architecture design, but International Conference on Mobile Services, 2014.
also in architecture evaluations, a closer look to decisions around [24] A. Brodt, O. Schiller, S. Sathish, B. Mitschang. A Mobile Data
data is necessary due the high impact on the achievement of Management Architecture for Interoperability of Resource and Context
quality attributes. We would like to encourage practitioners to Data. Int. Conference on Mobile Data Management (MDM), 2011.
share their examples, experiences, and lessons learned as we did [25] W. Huaigu, L. Hamdi, N. Mahe. TANGO: A Flexible Mobility-Enabled
with this paper in order to help other practitioners and to make Architecture for Online and Offline Mobile Enterprise Applications. Int.
the research community aware of new challenges. Conference on Mobile Data Management (MDM), 2010.
[26] W. Vogels. Eventually Consistent. Comm. of the ACM, 52:40. 2009.
REFERENCES [27] Hibernate. http://hibernate.org.
[1] S. Ghemawat, H. Gobioff, S.T. Leung. The Google File System. ACM [28] Semarchy. Back to Basics: Transactional, Master, Golden and Reference
SIGOPS Operating Systems Review, Volume 37 Issue 5, December Data explained. http://www.semarchy.com/semarchy-
2003. blog/backtobasics_data_classification/, 2012.
[29] M. Fowler. CQRS. http://martinfowler.com/bliki/CQRS.html, 2011.
84

Why Data Needs More Attention in Architecture Design

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Why Data Needs More Attention in Architecture Design

Uploaded by

Copyright:

Available Formats

2015 12th

Why Data needs more Attention in Architecture Design

978-1-4799-1922-2/15 $31.00 © 2015 IEEE 75

weak as these books lack a focus on quality attributes. And

which is also for the agricultural domain and it involves some

«Data» «Component» «Component»

be able to design adequate solutions (e.g. usage of keychain

You might also like