Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 41

Publications Office

Production and dissemination of the


Supplement to the Official Journal of the
European Union: TED website, OJS DVD-ROM
and related offline and online media
Software Architecture Document

Subject Software Architecture Document

Version / Status 1.00

Release Date 28/04/2010

Filename 702881173.doc

Document Reference TED-SAD


Production and dissemination of the
Supplement to the Official Journal of the
European Union: TED website, OJS DVD-ROM
and related offline and online media

Ref: TED-SAD Software Architecture Document Version: 1.00

Table Of Contents

1 Introduction...................................................................................................................................... 6
1.1 Purpose of the Document...................................................................................................... 6
1.2 Scope of the Document......................................................................................................... 6
1.3 Intended Audience................................................................................................................ 6
2 Reference and Applicable Documents............................................................................................. 7
3 Acronyms and Abbreviations........................................................................................................... 8
4 Architectural Representation............................................................................................................ 9
5 Logical View................................................................................................................................... 10
5.1 TED Website....................................................................................................................... 10
5.1.1 Overview..................................................................................................................... 10
5.1.2 Web Layer Design Package.......................................................................................12
5.1.3 Service Layer Design Package...................................................................................14
5.1.4 Domain layer............................................................................................................... 15
5.1.5 Data access layer....................................................................................................... 15
5.1.6 General Principles...................................................................................................... 15
5.2 Monitoring data-warehouse................................................................................................. 16
5.2.1 BIRT........................................................................................................................... 16
5.2.2 Cacti........................................................................................................................... 17
5.2.3 Webalizer.................................................................................................................... 17
5.3 License Holder environment................................................................................................ 18
5.3.1 Authentication and logging.......................................................................................... 19
5.4 Email analysis and notifications...........................................................................................19
5.5 Workflow engine.................................................................................................................. 19
5.5.1 Validation and files transformation..............................................................................20
5.5.2 PDF generation and time-stamping............................................................................20
5.5.3 Indexing...................................................................................................................... 21
5.5.4 DVD image creation.................................................................................................... 21
5.5.5 Contracting authority notification.................................................................................21
5.6 Notice viewer....................................................................................................................... 22
6 Implementation View...................................................................................................................... 23
6.1 TED Website....................................................................................................................... 23
6.1.1 Overview..................................................................................................................... 23
6.1.2 TED XSL transformation............................................................................................. 24
6.2 Email analysis and notifications...........................................................................................24

702881173.doc Page 2 of 41
Production and dissemination of the
Supplement to the Official Journal of the
European Union: TED website, OJS DVD-ROM
and related offline and online media

Ref: TED-SAD Software Architecture Document Version: 1.00

6.3 Workflow engine.................................................................................................................. 25


6.3.1 The workflow engine package....................................................................................25
6.3.2 The workflow engine Implementation.........................................................................25
6.3.3 The Indexing Implementation......................................................................................26
6.3.4 The workflow Transformation......................................................................................27
6.3.5 The workflow management tool..................................................................................28
6.4 Ted System i18n support..................................................................................................... 28
6.5 Notice viewer....................................................................................................................... 28
6.6 Reference data.................................................................................................................... 29
6.6.1 Reference data: Deletion............................................................................................ 30
6.6.2 Reference data: Addition............................................................................................30
6.6.3 Reference data: Modification......................................................................................30
6.7 Content modification............................................................................................................ 31
6.7.1 Addition of a new form................................................................................................ 31
6.7.2 Modification of reference data.....................................................................................31
6.8 Application Dependencies................................................................................................... 33
6.9 Backup procedure............................................................................................................... 34
6.9.1 Daily Back-end backup procedure..............................................................................34
6.9.2 Daily Front-end backup procedure..............................................................................35
6.9.3 Daily Data warehouse backup procedure...................................................................35
6.9.4 Daily Common backup procedure...............................................................................35
7 Data View....................................................................................................................................... 36
7.1 MySQL cluster..................................................................................................................... 36
7.2 Technical Columns.............................................................................................................. 37
7.2.1 Audit segment............................................................................................................. 37
8 Deployment view............................................................................................................................ 38
8.1 Network File System server................................................................................................. 40
8.1.1 TED repository file system.......................................................................................... 40
8.1.2 TED temporary backup file system.............................................................................41
8.1.3 TED mirror backup file system....................................................................................41
8.1.4 Windows XP via VMWare........................................................................................... 41
8.2 James email Servers........................................................................................................... 42
8.2.1 DNS Configuration...................................................................................................... 42
8.2.2 Spam folders............................................................................................................... 42
8.3 Database Organisation........................................................................................................ 42

702881173.doc Page 3 of 41
Production and dissemination of the
Supplement to the Official Journal of the
European Union: TED website, OJS DVD-ROM
and related offline and online media

Ref: TED-SAD Software Architecture Document Version: 1.00

LIST OF TABLES

Table 1: Reference Documents.............................................................................................................. 7


Table 2: Applicable Documents.............................................................................................................. 7
Table 3: TED XSL Transformation........................................................................................................ 24
Table 4: Indexed fields......................................................................................................................... 27
Table 5: Modified On and Version columns..........................................................................................37
Table 6: Modified By column................................................................................................................ 37

LIST OF FIGURES

Figure 1 TED system modules............................................................................................................. 10


Figure 2 TED website responsibility based layers................................................................................11
Figure 3 Integration of Spring MVC with other layers...........................................................................12
Figure 4 Flow of a Request through Spring Security Filters.................................................................13
Figure 5 CACTI Active Session diagram.............................................................................................. 17
Figure 6 Webalizer Traffic Analysis diagram........................................................................................18
Figure 7 Webalizer Summary by Month diagram..................................................................................18
Figure 8 Production management steps............................................................................................... 20
Figure 9 Web application file structure................................................................................................. 23
Figure 10 Workflow engine package Structure.....................................................................................25
Figure 11 Flow Service class diagram.................................................................................................. 26
Figure 12 Indexing Flow....................................................................................................................... 26
Figure 13 Notice Viewer structure........................................................................................................ 29
Figure 14 Reference Data Model......................................................................................................... 31
Figure 15 Database replication with MySQL Cluster............................................................................36
Figure 16 TED Deployment diagram.................................................................................................... 38
Figure 17: Deployment diagram........................................................................................................... 39

702881173.doc Page 4 of 41
Production and dissemination of the
Supplement to the Official Journal of the
European Union: TED website, OJS DVD-ROM
and related offline and online media

Ref: TED-SAD Software Architecture Document Version: 1.00

1 INTRODUCTION

1.1 PURPOSE OF THE DOCUMENT


The aim of this document is to provide a comprehensive architectural overview of the TED system.
This document describes how functional analysis and use cases are translated and structured in the
architecture by the development team.

1.2 SCOPE OF THE DOCUMENT


This document presents the technical architecture of the TED system. In this document, we focus on
the choices made for the TED system. Hereafter, the readers will find information about the
frameworks, tools and technologies used by the TED system.

1.3 INTENDED AUDIENCE


The present document is intended to be read by the following people:
 Publishing operation team;
 Publications Office Project Team;
 Developments Project Team.

702881173.doc Page 5 of 41
Production and dissemination of the
Supplement to the Official Journal of the
European Union: TED website, OJS DVD-ROM
and related offline and online media

Ref: TED-SAD Software Architecture Document Version: 1.00

2 REFERENCE AND APPLICABLE DOCUMENTS


This section contains the lists of all references an applicable document. When referring to any of the
documents below, the bracketed reference will be used in the text, such as [R01].
REFERENCE DOCUMENTS
Ref. Title Reference Version Date

R01 TED-FSP-Functional TED-FSP 1.00 07/09/2009


Specifications

R02 TED-DML-Data Model TED-DML 1.00 28/04/2010

Table 1: Reference Documents

APPLICABLE DOCUMENTS
Ref. Title Reference Version Date
General Invitation to
Tender
Production and
dissemination of the
Supplement to the Official
A01 N° 10186 N/A 06/01/2009
Journal of the European
Union: TED website, OJS
DVD-ROM and related
offline and on line media
Specifications
Hybrid service contract
Production and
dissemination of the
A02 supplement to the Official N°10186 NA 06/01/2009
Journal of the European
Union: TED Website, OJS
DVD—ROM and related
offline and on line media

A03 Project Quality Plan TED-PQP 1.01 08/09/2010

Table 2: Applicable Documents

702881173.doc Page 6 of 41
Production and dissemination of the
Supplement to the Official Journal of the
European Union: TED website, OJS DVD-ROM
and related offline and online media

Ref: TED-SAD Software Architecture Document Version: 1.00

3 ACRONYMS AND ABBREVIATIONS


ABBREVIATIONS AND ACRONYMS
Abbreviation Meaning
AOP Aspect Oriented Programming
API Application Programming Interface
CPV Common Procurement Vocabulary
CRUD Create Retrieve Update Delete
DAO Data Access Object
ECMT European Commission Machine Translation
FTP File Transfer Protocol
HTTP HyperText Transfer Protocol
IoC Inversion of Control
JAR Java Archive
JDK Java Development Kit
JEE Java Enterprise Edition
JSP Java Server Page
JTA Java Transaction API
LGPL GNU Library or Lesser General Public License.
MVC Model View Controller
NUTS Nomenclature des Unités Territoriales et Statistiques
OJS Official Journal Supplement
OOD Object-Oriented Design
OPOCE Office des Publications Officielles des Communautés Européennes
POJO Plain-Old Java Object
TED Tenders Electronic Daily
UDF Universal Disk Format
URL Uniform Resource Locator
WAR Java Web Archive

702881173.doc Page 7 of 41
Production and dissemination of the
Supplement to the Official Journal of the
European Union: TED website, OJS DVD-ROM
and related offline and online media

Ref: TED-SAD Software Architecture Document Version: 1.00

4 ARCHITECTURAL REPRESENTATION
This document is a part of the Technical Specification of the TED System, the result of the design
phase.
This document presents the necessary views to represent the software architecture:
 The Logical View: presents the decomposition of the software architecture into subsystems
and packages;
 The Implementation View describes the overall structure of the implementation model, the
decomposition of the software into layers and subsystems;
 The Data View describes the persistent data storage perspective of the system;
 The Deployment View describes the physical infrastructure on which the TED software is
deployed and run. It specifies the physical nodes and network configuration that executes the
software, and also maps the processes defined in the Process View on to physical nodes.

702881173.doc Page 8 of 41
Production and dissemination of the
Supplement to the Official Journal of the
European Union: TED website, OJS DVD-ROM
and related offline and online media

Ref: TED-SAD Software Architecture Document Version: 1.00

5 LOGICAL VIEW
The Logical View presents an overview of the architecture and then provides the decomposition of the
software into design packages and sub-systems.
The TED system has been decomposed into six distinct modules represented in the next figure:

Figure 1 TED system modules


 TED website: This module represents all the components needed for the public web interface
of the TED system. The TED Website section describes it in details.
 Monitoring Data-warehouse: This module contains the data-warehouse user interface, it is
described in detail in the Monitoring data-warehouse section.
 License Holder environment: The environment available for subscriber having a privileged
access to the contents of the TED. This module is described in details in the License Holder
environment section.
 Email analysis and notifications: This module is responsible for the mailing of notifications and
received emails analysis. This module is described in details in the Email analysis and
notifications section.
 Workflow engine: The workflow engine module contains all the components used for the
production management of the documents on the TED system. This includes indexing,
creation of DVD images, file transformations and the production dashboard. This module is
described in details in the section Workflow engine.
 Notice viewer: The notice viewer is a simple stand alone Java application that executes the
transformations on the given XML files. This module is described in details in the Notice
viewer section.

5.1 TED WEBSITE

5.1.1 OVERVIEW
The TED system architecture is based on the J2EE application architecture. This architecture is
decomposed into ‘tiers’ and ‘layers’ as recommended by the J2EE specification.
The Layering design pattern when applied to a system breaks down the complexity of the system as a
whole by identifying the different parts of the system and reducing coupling between them. Layering

702881173.doc Page 9 of 41
Production and dissemination of the
Supplement to the Official Journal of the
European Union: TED website, OJS DVD-ROM
and related offline and online media

Ref: TED-SAD Software Architecture Document Version: 1.00

reduces the impact of a change in one layer on the rest of the system. Multi-dimensional layering is
about the combination of two other strategies:
 Responsibility-based layering that associates each layer with a specific responsibility
(presentation, business and integration);
 Reuse-based layering that identifies components that have a high potential of reusability,
possibly across different projects.
The application is made of several responsibility based layers:

Figure 2 TED website responsibility based layers


 The Web Layer contains the logic to handle the interaction between the user and the system
via a Web Browser. To achieve this interaction, the Web Layer is allowed to call high-level
functions, provided in the Service Layer, and manipulate domain models, exposed in the
presentation, of the Domain Layer;
 The Service Layer contains the business logic structured in high-level methods oriented
around use cases that may result in CRUD operations on entities of the Domain Layer. These
CRUD operations are realized by accessing the Data Access Layer;
 The Domain Layer contains data, common rules and logic of the model: business or technical,
persistent or transient.

702881173.doc Page 10 of 41
Production and dissemination of the
Supplement to the Official Journal of the
European Union: TED website, OJS DVD-ROM
and related offline and online media

Ref: TED-SAD Software Architecture Document Version: 1.00

 The Data Access Layer acts as a medium between the entities of the Domain Layer and the
technical solutions insuring its durability. The Data Access Layer knows how and where the
persistent entities are stored. Typically, an entity of the Domain Layer has a corresponding
Data Access Object (DAO) in this layer that exposes methods to manage the object
persistence.

5.1.2 WEB LAYER DESIGN PACKAGE


The Web Layer is built on top of the Spring MVC framework. This framework implements and makes
intensive use of the different design patterns:
 Model-View-Controller;
 Front Controller;
 Command Object.
The purpose here is not to give a complete explanation of how Spring MVC works but rather to
describe the philosophy and how in practice the TED system uses and extends Spring MVC.
The Model-View-Controller is the separation of concerns applied to the presentation tier, i.e., it
separates the view from the business data and processes, the controller being responsible for
handling requests and acting as a medium between the model and the view. With Spring MVC,
business objects can be reused as they are (no class extension or interface implementation required).
The following figure shows how Spring MVC components interact with the Application and Data
Access layers.

Figure 3 Integration of Spring MVC with other layers


In TED, the Model-View-Controller is implemented as Java classes extending the Spring Framework
classes (such as SimpleFormController).
The Front Controller design allows one to avoid having a separate servlet for each controller. Instead,
Spring MVC provides a generic servlet, the DispatcherServlet, which dispatches the request to a
specific controller. In TED, the Front Controller is handled by the DispacherServlet class of the Spring
Framework
The Command Object design pattern is used to map the HTTP request and parameters to a Java
object holding all the information. In TED, the Command Objects are implemented as simple Java
classes, which all extend the same parent class: TedDefaultPageCommand.

702881173.doc Page 11 of 41
Production and dissemination of the
Supplement to the Official Journal of the
European Union: TED website, OJS DVD-ROM
and related offline and online media

Ref: TED-SAD Software Architecture Document Version: 1.00

5.1.2.1 Access Control


When a request is submitted to a Spring Security protected web application, it is ensured to be
processed by Spring Security, through the standard Java Servlets and Filters. Indeed, Spring Security
provides a framework and a set of components to build the whole security processing chain. Thus a
request to a protected resource passes through each of Spring Security’s filters as depicted in figure
below:

Figure 4 Flow of a Request through Spring Security Filters


Each of these filters play a specific role in the amount of security desired to protect a web resource.
The first filter for example can enforce that web requests must use a given channel; i.e. HTTPS for
example. The second filter, Authentication-Processing Filter, is in charge of redirecting the user and
authenticating the user if the web resource is indeed protected. The fourth filter, Security Enforcement
Filter, is also interesting in that it checks that the appropriate access rights are given to the logged-in
user in order to access that web resource. This check is modular and might comprise of a combination
of different rules; allowing complex Access Control Lists (ACL).
This simple but powerful chaining mechanism ensures that all requests made by a web browser
comply with the security constraints imposed. These constraints can be set in configuration files, such
that it is external from the base source code.

5.1.2.2 Rollover menus with CSS


For all rollover menus of the website we use the “:hover” CSS attribute.
IE6 does not support this attribute on every html tag. To make it work on IE6 we use a javascript
function and a supplementary class in the CSS files (see below).
To inform "IE6/javascript disabled" users we use the tag <!--[if lt IE 7]> on our HTML pages.
Under this condition we use a <noscript> tag with a warning message which informs users about the
unusability of the navigation, when using IE 6 with no Javascript.
Javascript function:

702881173.doc Page 12 of 41
Production and dissemination of the
Supplement to the Official Journal of the
European Union: TED website, OJS DVD-ROM
and related offline and online media

Ref: TED-SAD Software Architecture Document Version: 1.00

<!--[if lt IE 7]>
<script type="text/javascript">
//Fonction destinée à remplacer le "LI:hover" pour IE 6
sfHover = function() {
var sfEls = document.getElementsByTagName("li");
for (var i=0; i<sfEls.length; i++) {
sfEls[i].onmouseover = function() {
this.className = this.className.replace(new RegExp("sfhover"), "");
this.className += " sfhover";
}
sfEls[i].onmouseout = function() {
this.className = this.className.replace(new RegExp("sfhover"), "");
}
}
}
if (window.attachEvent) window.attachEvent("onload", sfHover);
</script>
<![endif]-->
Warning Message:
<!--[if lt IE 7]>
<noscript>
<span class="red">
Attention vous utilisez une ancienne version d'internet explorer sans
javascript ...
</span>
</noscript>
<![endif] -->
CSS class:
Every <tag>:hover must have an equivalent <tag>.sfhover

5.1.3 SERVICE LAYER DESIGN PACKAGE


Transaction demarcations are managed declaratively using the Spring Framework. The selected
underlying transactions are handled by the Spring DataSourceTransactionManager.
The transactions are defined at the Service Level. Service class methods represent use-cases that are
usually considered atomic from a transactional point of view. This is then a good place to manage the
transaction. Following Spring’s philosophy, the transactions are configurable in all aspects (isolation
level, timeout …) in annotations.
The definition of the transactional boundaries has no impact on the Service classes.

5.1.3.1 Search service


One of the major characteristics of the TED website is its search capability. All the search
functionalities are implemented on the top of the Lucene library.
Apache Lucene is a high-performance, full-featured text search engine library written in Java. It is
suitable for any application which requires full text indexing and searching capability, Lucene has been
widely recognized for its utility in the implementation of Internet search engines and local, single-site
searching. The Lucene API is also known for its flexibility that allows it to be independent of the file
format to index.

702881173.doc Page 13 of 41
Production and dissemination of the
Supplement to the Official Journal of the
European Union: TED website, OJS DVD-ROM
and related offline and online media

Ref: TED-SAD Software Architecture Document Version: 1.00

All the search capabilities needed by the TED website are encapsulated within this search service.
The index file is progressively aggregated by the addition of the information retrieved from the parsing
and the indexation of the new documents. This process of indexation is handled by the content
management module which performs this operation for each new OJS release.

5.1.4 DOMAIN LAYER


The Domain Layer contains data, common rules and logic of the model. This layer contains the
identified business entities. This layer is unaware of how the domain object persistence is managed.
That is the responsibility of the Data Access Layer;

5.1.5 DATA ACCESS LAYER


This section describes the approach to form the basis for JDBC database access using Spring.
The JdbcTemplate class is the central class in the Spring JDBC core package that is used by TED. It
simplifies the use of JDBC since it handles the creation and release of resources. This helps to avoid
common errors such as forgetting to always close the connection. It executes the core JDBC workflow
like statement creation and execution, leaving application code to provide SQL and extract results.
This class executes SQL queries, update statements or stored procedure calls, imitating iteration over
ResultSets and extraction of returned parameter values. It also catches JDBC exceptions and translates
them to a more informative exception hierarchy.
The system makes use of the SimpleJdbcTemplate class which is a wrapper around the classic
JdbcTemplate that takes advantage of Java 5 language features such as variable arguments and auto-
boxing.
In order to work with data from a database, one needs to obtain a connection to the database. The
way Spring does this is through a DataSource. A DataSource is part of the JDBC specification and can
be seen as a generalized connection factory. It allows a container or a framework to hide connection
pooling and transaction management issues from the application code.
Spring provides other utility classes such as the RowMapper. A RowMapper instance is a convenience
class used to map one object per row obtained from iterating over the ResultSet that is created during
the execution of the query.

5.1.6 GENERAL PRINCIPLES


This section contains the general principles underlying the system and promoted by the architecture.
These principles are too general to be exposed as a specific design package but they are important
enough to be mentioned.
This section provides a short description of these principles.

5.1.6.1 Programming to Interfaces


This principle is also known in the longer version ‘Programming to Interfaces, not implementations’.
When a piece of software is developed, an implementation class must not directly be dependent on
other implementation classes but rather to their implemented interface.
This improves the scalability and maintainability of the software as other implementations of the
interfaces can be substituted for the current one with little impact on the dependent modules.
Its use is facilitated by the ‘Dependency Injection’ principle.
The ‘Programming to Interfaces’ principle also eases the test strategy of the software, mostly with unit
testing. Object classes are tested in isolation, as the test provides mock implementation for the
dependent interfaces used by the tested object.

702881173.doc Page 14 of 41
Production and dissemination of the
Supplement to the Official Journal of the
European Union: TED website, OJS DVD-ROM
and related offline and online media

Ref: TED-SAD Software Architecture Document Version: 1.00

5.1.6.2 Dependency Injection


The ‘Dependency Injection’ principle greatly facilitates the previous design principle, ‘Programming to
interfaces’. It removes the need for each object to declare explicitly in the JAVA code its dependencies
to the implementation classes. Configuration files do the job instead.
Each object is created by a container that populates the object with its dependencies. Thus, the object
does not know anymore the implementation class, only the interfaces.
In this project, this container is shipped along with the Spring Framework. Dependency injection (IoC:
Inversion of Control) is the base principle of Spring.

5.1.6.3 Aspect Oriented Programming


Aspect-Oriented Programming (AOP) complements Object-Oriented Programming (OOP) by providing
another way of thinking about program structure. In addition to classes, AOP gives you aspects.
Aspects enable modularization of concerns such as transaction management that cut across multiple
types and objects. (Such concerns are often termed crosscutting concerns.)
One of the key components of Spring is the AOP framework. While the Spring IoC container does not
depend on AOP, meaning you don't need to use AOP if you don't want to, AOP complements Spring
IoC to provide a very capable middleware solution.

5.2 MONITORING DATA-WAREHOUSE


The data warehouse information is made available for administrators using the web interface. Its
content is built using several tools, which are described in this section.
The Layering design pattern is also applied for the monitoring data-warehouse to break down the
complexity of the system as a whole by identifying the different parts of the system and reducing
coupling between them. The following sections give an overview of the components that are used to
combine and represent the information needed into web reports.

5.2.1 BIRT
BIRT (Business Intelligence and Reporting Tools) is a reporting system for web applications. BIRT has
two main components: a report designer based on Eclipse, and a runtime component. BIRT also offers
a charting engine that lets you add charts to your own application.
BIRT stated goals within the TED project are to address a wide range of reporting needs including:
 Lists - The simplest reports are lists of data. As the lists get longer, BIRT supports grouping to
organize related data together but also totals, averages and other summaries.
 Charts - For some reports numeric data are presented as a chart. BIRT provides pie charts,
line charts, bar charts and many more. BIRT charts can be rendered in several formats.
 Crosstabs - Crosstabs (also called a cross-tabulation or matrix) are used to displays reports
that need to represent data in two dimensions.
 Compound Reports – This kind of report is used to display side-by-side previously described
elements into a single document.
BIRT reports consist of four main parts: data, data transformations, business logic and presentation.
 Data – Several kinds of data sources may be used simultaneously with BIRT. For the TED
project the main data source is the data warehouse databases. JDBC is used as connector
between the database and BIRT.

702881173.doc Page 15 of 41
Production and dissemination of the
Supplement to the Official Journal of the
European Union: TED website, OJS DVD-ROM
and related offline and online media

Ref: TED-SAD Software Architecture Document Version: 1.00

 Data Transformations - Reports present data sorted, summarized, filtered and grouped to fit
the user's needs. While the database can do some of this work, BIRT is used to perform
sophisticated operations such as grouping on sums, percentages of overall totals and more.
 Business Logic - Since data is seldom structured exactly as it is needed, some reports require
business-specific logic to convert raw data into information useful for the user.
 Presentation - Once the data is ready, a wide range of display options may be used; tables,
charts, text and more.

5.2.2 CACTI
Cacti is a complete network graphing solution designed to harness the power of RRDTool's data
storage and graphing functionality. Cacti provides a fast poller, advanced graph templating, multiple
data acquisition methods, and user management features out of the box.

Figure 5 CACTI Active Session diagram

5.2.3 WEBALIZER
Website traffic analysis is produced by grouping and aggregating various data items captured by the
web server in the form of log files while the website visitor is browsing the website.

702881173.doc Page 16 of 41
Production and dissemination of the
Supplement to the Official Journal of the
European Union: TED website, OJS DVD-ROM
and related offline and online media

Ref: TED-SAD Software Architecture Document Version: 1.00

Figure 6 Webalizer Traffic Analysis diagram

Figure 7 Webalizer Summary by Month diagram

5.3 LICENSE HOLDER ENVIRONMENT


The License Holder environment module is limited as the ProFTPD server and its modules.
The content of the environment is generated by the content management module. Then, a symbolic
link used by the ProFTPD server is updated to put at License Holder disposal the new files.

702881173.doc Page 17 of 41
Production and dissemination of the
Supplement to the Official Journal of the
European Union: TED website, OJS DVD-ROM
and related offline and online media

Ref: TED-SAD Software Architecture Document Version: 1.00

5.3.1 AUTHENTICATION AND LOGGING


The ProFTPD server for the License Holders makes use of a specific module to enhance his
functionalities. The needed functionalities are
 Authentication using the user information contained in the MySQL database.
 Logging of the License Holder environment usage statistics. These files will be parsed and the
extracted information will be stored in the datawarehouse database.
The mod_sql module is installed to add these two functionalities to ProFTPD. It is comprised of a
front end module (mod_sql) and backend database-specific modules (mod_sql_mysql). The front
end module leaves the specifics of handling database connections to the backend modules.

5.4 EMAIL ANALYSIS AND NOTIFICATIONS


The email analysis and notifications module is in charge of the analysis of received emails and the
mailing of notifications and reminders to Contracting Authorities or web site users.
The email analysis and notifications module is implemented as an email processing agent built on the
top of the Apache James Mailet API. A mailet is a mail processing component which is executed within
a mailet container.
The Mailet API defines interfaces for both Matchers and Mailets:
 Matchers are used to match mail messages against certain conditions. They return some
subset (possibly the entire set) of the original recipients of the message if there is a match. An
inherent part of the Matcher contract is that a Matcher should not induce any changes in a
message under evaluation.
 Mailets are responsible for actually processing the message. They may alter the message in
any fashion, or pass the message to an external API or component. This can include
delivering a message to its destination repository or SMTP server.
In the TED project, Matchers are used to analyse the emails and detect spam. An internet blacklist is
used to detect the undesirable email (any mail with sender matching an entry in this blacklist is
automatically forwarded to the spam folder). The “out of office” replies are also managed in a special
way: all incoming mails are searched for a given pattern (for instance “*out of office*”) in the subject or
content of the mails. If the pattern matches, the mail is automatically flagged as out-of-office, and is
forwarded to the out-of-office folder. The subject of these mails is prefixed by “Out of office”.
Mailers, on the other hand, are used to fulfil the mailing of notifications.
For the TED project, the Apache JAMES server is used as container and is responsible for the
assembly and configuration of the deployed Mailet and Matchers.

5.5 WORKFLOW ENGINE


The workflow engine is responsible for the processing of the document files received by the
Publications Office and the creation of the file system used for the creation of the DVD images (daily,
weekly and monthly images).
The performed operations are mainly transformations and indexation of the received file. The content
management module also contains the production management dashboard which is used to monitor
and controls the steps depicted on the figure bellow:

702881173.doc Page 18 of 41
Production and dissemination of the
Supplement to the Official Journal of the
European Union: TED website, OJS DVD-ROM
and related offline and online media

Ref: TED-SAD Software Architecture Document Version: 1.00

Figure 8 Production management steps

5.5.1 VALIDATION AND FILES TRANSFORMATION


The purpose of the Validation and files transformation step is to create the formatted content to be
published from the received XML notices. The prepared content is then stored in the content library.
The different transformations may be executed at different time. Technically, the transformations are
performed through XSLT for all the formats to be supported.
RSS feeds are generated for the publication day by querying the corresponding notices, formatting the
RSS feed and storing the result in content library. RSS feeds are generated after the creation of the
index according to the description of the next section.
Notice family changes are populated to existing notices.

5.5.2 PDF GENERATION AND TIME-STAMPING


For the generation of PDFs, we use XSL-FO as an intermediate format, and custom version of Apache
FOP as the composition engine. Apache FOP was customized in order to add support of PDF/A-1a.
Standard compression and file organisation techniques are used to compile the results per publication
channel.
The time stamping of the PDF/A-1a notices is performed by the PDF Time Stamping tool using open
source PDF and Cryptography libraries. Once the PDFs are time stamped, they are stored in the
content library.1

1
Notice that PDF time stamping is currently not activated on the TED web site: a flag permit to put the
time-stamping service in a degraded mode.

702881173.doc Page 19 of 41
Production and dissemination of the
Supplement to the Official Journal of the
European Union: TED website, OJS DVD-ROM
and related offline and online media

Ref: TED-SAD Software Architecture Document Version: 1.00

5.5.3 INDEXING
All the files are indexed after validation and transformation using Apache Lucene. Documents are
parsed to extract elements needed for the search on specific elements but also for the free text
search. The indexing process is split in three distinct steps; creation of the five days index, creation of
the active index and finally update of the archive index.

5.5.4 DVD IMAGE CREATION


Three distinct DVD images related to the dissemination of the Supplement of the Official Journal are
created by the system during the production process. These image files contain different file types
which are mainly related to the PDF format:
 PDF/A-1a: the file format for the long-term archiving of electronic documents. It is based on
the PDF Reference Version 1.4 from Adobe Systems Inc.
 PDF/A-1a time stamped: The time stamped version of the PDF/A-1a document file.
 PDX: The Acrobat Catalogue Index file contains the index of all the document of an OJS
issue. Acrobat reader is able to directly use this kind of file to perform searches on the content
of the documents. This file is built for the weekly DVD-ROM.
Generation of PDF/A-1a and PDF/A-1a time stamped files of the documents are generated by the
system using the XML documents. These transformations are explained in Validation and files
transformation. In the weekly DVD-ROM image, a PDX or PDF index file is created manually to index
all the document of the current OJS issue. Adobe Acrobat Professional is used for this purpose by the
publishing operations team. For performance reasons, this tool is installed in the production
environment to ensure a direct access to the files to index. Therefore, a remote desktop access to
Acrobat Professional is put at the disposal of the publishing team.
The table of contents PDF file is generated automatically by the system using iText. iText is a library
available under LGPL license for dynamic PDF document generation and manipulation.
The creation of the image file is an automated process triggered by the publishing operation team.
This is achieved using the mkisofs tool, with support of UDF format.

5.5.5 CONTRACTING AUTHORITY NOTIFICATION

A Contracting Authorities are notified by the TED system about the publication of their notices. A
notification is sent to each contracting authority to notify them that their notices have been published in
the OJS. The email contains an UDL link to the notice of the corresponding contracting authority and
the time-stamped PDF/A 1a.

The TED system also sends a reminder to the Contracting Authority for each contract notice that does
not have a corresponding award notice.
Of course, in order to be able to send reminder and notification emails, the TED system needs to be
able to retrieve the email address of the Contracting Authorities for each specific notice. Unfortunately,
there is no way to extract this contracting authority email address in a “standard” way. This
information does not exists in the common notice XML header. Actually, a different extraction method
exists for each type of form. The table named DOCUMENT_XML_INFO contains the XPath to the
Contracting Authority email for the different type of forms. This implementation choice avoid to
hardcode the extraction rules in the code, and provide a much more flexible way to support new form
in the system.

702881173.doc Page 20 of 41
Production and dissemination of the
Supplement to the Official Journal of the
European Union: TED website, OJS DVD-ROM
and related offline and online media

Ref: TED-SAD Software Architecture Document Version: 1.00

5.6 NOTICE VIEWER


The notice viewer module represents a stand-alone java application responsible for the transformation
of XML files in a well formatted version suited to be directly displayed. This application is installed on
an Office server and is used directly using command lines. The notice viewer doesn’t include user
interface nor persistence capabilities.
The notice viewer transformation support two output formats, the HTML output format and the PDF
format. It will first validate the input notice using the XML schema, then an UTF-8 validation of the
notice is performed.
Technically, the transformations are performed through XSLT for the generation of HTML files and for
the generation of PDFs, we use XSL-FO as an intermediate format, and Apache FOP as the
composition engine. An HSQL files based (read only) database is used to retrieve the translation of
the different reference data.

702881173.doc Page 21 of 41
Production and dissemination of the
Supplement to the Official Journal of the
European Union: TED website, OJS DVD-ROM
and related offline and online media

Ref: TED-SAD Software Architecture Document Version: 1.00

6 IMPLEMENTATION VIEW
The implementation view describes the overall structure of the implementation model and the
decomposition of the software into modules and specific components.

6.1 TED WEBSITE

6.1.1 OVERVIEW

The TED application is packaged as two separate Web Archive files (WAR) that represent the TED
website and the data-warehouse. This separation allows the deployment of each of these applications
separately on different servers. The following figure shows the physical contents of these web
applications. Note that the two applications share the same file structure; the difference being the
specific JSP pages and Java classes (along with their dependent Java libraries).

Figure 9 Web application file structure

702881173.doc Page 22 of 41
Production and dissemination of the
Supplement to the Official Journal of the
European Union: TED website, OJS DVD-ROM
and related offline and online media

Ref: TED-SAD Software Architecture Document Version: 1.00

6.1.2 TED XSL TRANSFORMATION


The TED Website is based on the XML transformation (XSL transformation) to transform the input
TED_EXPORT XML file to the different presentation views: HTML or PDF.
During the daily processing (see Workflow engine) the export XML file is transformed to an internal
XML (TED_INTERNAL) for each supported languages. The TED internal XML format contains all the
information needed for the presentation layer. For instance the reference data are translated in the
internal XML file for each supported language, the internal format is enriched with formatting
information such as paragraph, URL and email addresses.
Only documents transformed during the daily processing are persisted on the file system. Historical
documents are transformed into the ted internal format on the fly (from OJS 2005/206 to OJS
2010/041). The following table shows the list of XSL transformations (input/output) for the TED
system.

TED TRANSFORMATION
XSL Input Output

InternalOJS-To-
InternalTed.xsl
“2.0.5 DTD” xml or
TED_INTERNAL XML
“TED_EXPORT 2.0.7” xml
InternalOJS-To-
InternalTed_<<FORM>>.xsl

InternalTed-To-HtmlTed.xsl TED_INTERNAL xml Notice HTML

InternalTed-To-
TED_INTERNAL xml Notice data view HTML
HtmlDataViewTed.xsl

InternalTed-To-
TED_INTERNAL xml Notice PDF
XmlFOTed.xsl

InternalTed-To-
INTERNAL_OJS xml Notice Meta License holder
LicenseHolderMETA.xsl

InternalTed-To-
INTERNAL_OJS xml Notice UTF-8 License holder
LicenseHolderUTF-8.xsl

Table 3: TED XSL Transformation

6.2 EMAIL ANALYSIS AND NOTIFICATIONS


The email analysis and notifications module is packaged in a JAR file that contains the classes
developed for the handling and filtering of emails. This jar is deployed on the James email server.
The email module is implemented as an email processing agent built on the top of the Apache James
Mailet API using Matchers and Mailets interface.

702881173.doc Page 23 of 41
Production and dissemination of the
Supplement to the Official Journal of the
European Union: TED website, OJS DVD-ROM
and related offline and online media

Ref: TED-SAD Software Architecture Document Version: 1.00

6.3 WORKFLOW ENGINE

6.3.1 THE WORKFLOW ENGINE PACKAGE


The workflow engine is packaged as an executable JAR file that contains the classes developed for
the file system creation, XML transformation, files indexing processing, and DVD generation. The
workflow engine is responsible for the instantiation and processing of a new flow for each publication
date.

Figure 10 Workflow engine package Structure

6.3.2 THE WORKFLOW ENGINE IMPLEMENTATION


The workflow engine is composed of multiple flow definitions:
- The daily OJS flow: is responsible for the processing of the data for the next
publication date.
- The User management flow: is responsible for the workflow management.
- The Contracting Authority reminder flow: is responsible of sending the notice
reminders.
- The reporting flow: is responsible for the processing of the report for cacti and
datawarehouse reports.
- The cleanup flow: is responsible to clean the file system of all temporary files.
Remarks: it exists a specific flow that is used only once for the historical data processing.
- The take up archive flow: is responsible for the processing of the full historical data
already in production (5 years of publication).

702881173.doc Page 24 of 41
Production and dissemination of the
Supplement to the Official Journal of the
European Union: TED website, OJS DVD-ROM
and related offline and online media

Ref: TED-SAD Software Architecture Document Version: 1.00

Figure 11 Flow Service class diagram


Each workflow definition is defined in a spring configuration. The flow definition must implement the
ProdFlowService interface. It contains the list of steps that compose the flow.
The flow definition is composed of several steps responsible for the execution of a specific task
according the step specification. Those steps must implement the ProdStepService interface and specify
the dependencies between the steps (waitingSteps).

6.3.3 THE INDEXING IMPLEMENTATION


During the Daily OJS flow several indexes are generated to provide a fast search engine to the TED
Website. The indexing process is based on the Lucene framework.
The Ted application receives a TED_EXPORT XML file. The input XML file is converted into a Lucene
Document object where each value to be indexed is mapped using a key/value pair.
Each field’s value is analysed using a Standard Lucene Analyser then it is indexed into the appropriate
folder. A full description of indexed field is available in the Table 4: Indexed fields.

Figure 12 Indexing Flow

Search field Code


Awarding authority search fields
Country of the awarding authority CY
Name of the awarding authority AU
Place TW
Type of awarding authority AA

702881173.doc Page 25 of 41
Production and dissemination of the
Supplement to the Official Journal of the
European Union: TED website, OJS DVD-ROM
and related offline and online media

Ref: TED-SAD Software Architecture Document Version: 1.00

Internet address (URL) of the awarding authority IA


Date search fields
Date document sent to the Publications Office DS
Deadline for request of documents DD
Deadline for receipt of tenders DT
Publication date PD
Reference search fields
Original language OL
Number of reference document RN
Document number ND
Edition number of Supplement to the Official Journal OJ
Codification search fields
Type of document TD
Type of contract NC
Type of procedure PR
Origin (applicable regulation of procurement) RP
Type of tender, division into lots TY
Criteria for award of contract AC
Title of document TI
Main activity MA
Title of the main activity MN
Classification search fields
Original CPV code (until 16 September 2008) OC
Original title of the CPV code (until 16 September 2008) ON
Current CPV code (from 17 September 2008) PC
Current title of the CPV code (from 17 September 2008) PN
NUTS code RC
Title of the NUTS code RG
TED specific fields
Full text FT

Table 4: Indexed fields

6.3.4 THE WORKFLOW TRANSFORMATION


Several steps during the daily processing use the XML transformation to transform the TED_EXPORT
input files into different other file formats such as license holder files or PDF notices.

702881173.doc Page 26 of 41
Production and dissemination of the
Supplement to the Official Journal of the
European Union: TED website, OJS DVD-ROM
and related offline and online media

Ref: TED-SAD Software Architecture Document Version: 1.00

The Table 3: TED XSL Transformation shows the list of XSL transformation (input/output) used by the
TED Workflow engine.

6.3.5 THE WORKFLOW MANAGEMENT TOOL


The workflow management tool (Dashboard or workflow management interface) application is
packaged as Web Archive (WAR). The workflow management allows the control of the production
lines within a single user interface. It is implemented using standard java Servlet and JSP. The
communication between the management tool and the workflow engine is built over the socket API.

6.4 TED SYSTEM I18N SUPPORT


We use two mechanisms to support the multilingualism for the TED System (TED Website and TED
Workflow engine). The business data (such as CPV, NUTS) translations are stored into the database
and the interface message are stored into XML files.
 The reference data are all translated in the database in the table <code>_Description that
contains the translations in the 23 languages supported by the TED Website.
 The spring framework offers a simple and easy mechanism to support i18n:
ReloadableResourceBundleMessageSource. The labels in the Ted Website are all translated
using the spring mechanisms. The files messages_<<language code>>.xml contain the labels and
messages displayed to the users by the TED Website. The errors_<<language code>>.xml files
contain the error messages shown to the users.

6.5 NOTICE VIEWER


The Notice viewer is packaged as a tar.gz archive that contains the classes developed for XML
transformations. These archive contain all the dependencies necessary for the XML file transformation
and production management. The notice viewer implementation use an embedded HSQL database to
hold the reference data and associated translations for multilingual support. The transformations are
performed through XSLT for the generation of HTML and PDF files.

702881173.doc Page 27 of 41
Production and dissemination of the
Supplement to the Official Journal of the
European Union: TED website, OJS DVD-ROM
and related offline and online media

Ref: TED-SAD Software Architecture Document Version: 1.00

Figure 13 Notice Viewer structure

6.6 REFERENCE DATA


The reference data are the business code data. Each reference data are composed of a code and 23
translations. All reference data are versionable, some of these data are also hierarchical.
The reference data are:
 Heading
 Country
 Country groups
 Type of authority (sector or awarding authority)
 Contract type (market code)
 Procedure type
 Document type
 Regulation type
 Type of bid
 Award criteria
 CPV code
 Business Sectors
 Main activity

702881173.doc Page 28 of 41
Production and dissemination of the
Supplement to the Official Journal of the
European Union: TED website, OJS DVD-ROM
and related offline and online media

Ref: TED-SAD Software Architecture Document Version: 1.00

 NUTS code
 Languages
 Extended CPV code (Additional vocabulary)

Reference data stored on the TED system should change over the time, and the modifications should
be taken in account including the relationship between the codes of the new version and the previous
one.
Modifications of these reference data such as CPV Codes have an important impact on the whole TED
system and more especially on the search features.
To handle alteration on reference data, the TED system use a versioning algorithm that permits the
translation of old code version to new one to adapt as much as possible the search features. The
content of document won’t be modified.
The search index will generally be modified after a code change but the document itself won’t change.
Thus, it’s possible that a free text search will find document that do not have the searched text in its
content.

6.6.1 REFERENCE DATA: DELETION


The version n of the reference data has codes that have been deleted in the version n+1.
When a code is deleted it is not available anymore in the search interface. All the documents using the
deleted code won’t be found anymore using the related criteria.

6.6.2 REFERENCE DATA: ADDITION


A new code has been added to version n+1.
The new code is added to the search interface. There’s no impact on the previous documents.

6.6.3 REFERENCE DATA: MODIFICATION


Several modification types could be foreseen especially in case of hierarchical data.

Case 1: Code in version N is replaced by a single code in version N+1.


Documents that use the previous version of the code will be re-indexed in order to be found with using
the associated new version of the code.

Case 2: Code in version N is replaced by multiple codes in version N+1.


In case of hierarchical data, Documents that use the previous version of the code will be re-indexed in
order to be found using the parent of the code in the old version. If no parent exists or if the data is non
hierarchical, this modification will be handled like a deletion.

Case 3: Code in version N moves in the hierarchy in version N+1.


Documents that use one of the previous version of the codes will be re-indexed in order to be found
using its new parents. Searching for this code using old parent codes won’t be possible anymore. An
exception to this rule will be made for Countries, when a country change of group the documents won’t
be re-indexed. In such case, only new documents will be found using the new parent code.

702881173.doc Page 29 of 41
Production and dissemination of the
Supplement to the Official Journal of the
European Union: TED website, OJS DVD-ROM
and related offline and online media

Ref: TED-SAD Software Architecture Document Version: 1.00

Figure 14 Reference Data Model

6.7 CONTENT MODIFICATION


This section describes the procedure that must be followed to add a new form or to update the
reference data.

6.7.1 ADDITION OF A NEW FORM


6.7.1.1 Prerequisite
The following information must be known before the addition of a new form in the TED system:
 Does the form contain specific business functionalities that must reflected on the TED
website? For instance for cancellation document a “cancelled” indicator is shown on the
document impacted.
 Does the new form contain contracting authority email addresses?
 Request the labels needed for the document view translated in all languages.

6.7.1.2 tasks
 If the document contains contracting authority email addresses, then the XPath to the tag
containing these addresses must be added to the table DOCUMENT_XML_INFO.
 The new XSLT transformations must be implemented to generate the internal format, the
HTML view, the PDF and License Holder’s specific formats.

6.7.2 MODIFICATION OF REFERENCE DATA


Several modification types are foreseen regarding the reference data. First, all the information needed
are described. Then the actions needed depending on the modification are explained.

702881173.doc Page 30 of 41
Production and dissemination of the
Supplement to the Official Journal of the
European Union: TED website, OJS DVD-ROM
and related offline and online media

Ref: TED-SAD Software Architecture Document Version: 1.00

6.7.2.1 Prerequisite
 In case of the addition of a new code. All the labels in every language must be requested.
 In case of the addition of a new code in a hierarchical reference data. The place of the code in
the hierarchy must be known.
 If the creation of a new version is foreseen. Then all the mappings between the current and
the next version must be clearly identified.

6.7.2.2 Modification of an existing reference data version


We consider a modification of an existing reference data version in the following cases:
 Only labels of existing codes in the current version must be changed.
 New codes are added and all the codes in the current version must be kept.
In these cases the reference data tables must be updated with the modifications needed. Then if
existing codes are modified; a full re-indexation for the reference data must be performed.
If the procedure type or document type reference data are impacted please also refer to section
6.7.2.4.

6.7.2.3 Creation of a new reference data version


We consider the creation of a new version of the reference data impacted in the following cases:
 Some of the codes in the current version are not used anymore and must be removed from
the website interface (search mask, browse,…).
 The code or the signification of a reference data changes.
A new version of the reference data must be created in the database:
 CODE_XXX table : mandatory
 CODE_XXX_VERSION table: mandatory
 CODE_XXX_MAPPING table: mandatory
 CODE_XXX_HIERARCHY table: mandatory if the reference data is hierarchical.

When the new version of the reference data is valid (not before!):
 All the documents must have been re-indexed using the new version of the code (with the help
of the new mapping).
 In the DOCUMENT table; the column XXX_CURRENT_VERSION must have been updated
with the id of the reference data in the last version.
If the procedure type or document type reference data are impacted please also refer to the next
section.

6.7.2.4 Modification of procedure (PR) and document type (TD)


If the procedure type or document type reference data are modified additional action must be
performed.

702881173.doc Page 31 of 41
Production and dissemination of the
Supplement to the Official Journal of the
European Union: TED website, OJS DVD-ROM
and related offline and online media

Ref: TED-SAD Software Architecture Document Version: 1.00

6.7.2.4.1 Procedure (PR) and document type (TD)


The combination between PR and TD gives information about the need to award a document.
Therefore if new PR or TD code must be added, it’s necessary to know if these new codes are related
to documents that need a reminder or not.
This information is stored in XX_CONTRACT_AWARD_NEEDED column in the main reference data
table.

6.7.2.4.2 Document type (TD)


Some document types are used to indicate that a notice is contract award. Therefore if a new TD code
is added, it’s necessary to know if the new code should be considered as an awarding type.
If a new awarding type must be taken in account or removed the column
TD_CONTRACT_AWARD_NOTICE of table CODE_DOCUMENT_TYPE must be updated accordingly.
The same procedure must be followed for document types identified as corrigenda. The column
TD_CORRIGENDA of table CODE_DOCUMENT_TYPE must be updated accordingly.

6.8 APPLICATION DEPENDENCIES

APPLICATION DEPENDENCIES
Application Layer External Systems / Dependency

Spring MVC
Web Layer
Spring Security

Spring Integration
Ted Application
File System
Integration Layer
MySQL Database

lucene

TED Workflow Web Layer JSP/Java Servlet


Application
Integration Layer Spring Integration

File System

XSL

XSL-FO

FOP

iText

James Mail

MySQL Database

Mkisofs

702881173.doc Page 32 of 41
Production and dissemination of the
Supplement to the Official Journal of the
European Union: TED website, OJS DVD-ROM
and related offline and online media

Ref: TED-SAD Software Architecture Document Version: 1.00

lucene

Notice Viewer Spring Integration

XSL

XSL-FO

FOP

HSQL

6.9 BACKUP PROCEDURE


The daily backup is implemented using a full database backup and an incremental file system
repository backup. These backups are configured to run overnight on each production lane.
The MySQL databases are additionally backed up using an export script. This script runs before the
scheduled disk backup in order to ensure that they are also included in the backup. It produces a
standard MySQL database export, which can be used for easy recovery into another database
instance. The script adds another level of fail tolerance for the stored data on top of the replication
mechanism.
Every day a daily backup on the back-end, front-end, data warehouse and common backup is
executed and a temporary folder on the NFS is created to hold the different backups. A cron script is
responsible to transfer the result to an external backup unit server.

6.9.1 DAILY BACK-END BACKUP PROCEDURE


Non cluster database backup
To backup back-end non-cluster database a dump is made for each back-end server. All non-cluster
tables and views are dumped. Finally a restore script is created for each dump.

Cluster database backup


The cluster database is split on the two production lanes, so the backup will dump the entire database.
To backup cluster database the MySQL Node manager is used. It takes a snapshot of each node of
the cluster. Then an archive of each snapshot is made. Finally a single restore script is created to
restore each node of the cluster.

Repository file system backup


In order to reduce the time of the file system backup; inotify 2 is used. It permits to log all modifications
made on a set of folders.
Inotify is used on the repository folder in order to make a file, listing all files modified, since the last
backup. In this case a faster rsync is possible by using this file.
Finally inotify file is also used to create an archive of the set of files modified since the last backup.

2
inotify is a file change notification system, a kernel feature that allows applications to request the
monitoring of a set of files against a list of events. When the event occurs, the application is notified

702881173.doc Page 33 of 41
Production and dissemination of the
Supplement to the Official Journal of the
European Union: TED website, OJS DVD-ROM
and related offline and online media

Ref: TED-SAD Software Architecture Document Version: 1.00

Backup synchronisation between databases and file system


To backup back-end server’s databases and file system must be synchronised. In order to achieve this
task the following tasks are made:
1. a lock is put on databases, to avoid any modifications;
2. inotify file snapshot is made;
3. Locks are released.

TED data file system


In a first part only the backup of the indexes and the RSS files of one of the two servers is made.
In a second part all logs of both servers are backed up.

6.9.2 DAILY FRONT-END BACKUP PROCEDURE


Database backup
The same backup procedure as back-end non-cluster database backup is used.

TED data file system


The same backup procedure as back end TED data file system backup is used. In this case logs and
license holder environment are backed up.

6.9.3 DAILY DATA WAREHOUSE BACKUP PROCEDURE


Database backup
The same backup procedure as the back-end non cluster database backup is used.

6.9.4 DAILY COMMON BACKUP PROCEDURE


The common backup is a backup that is execute on all server and are common to all server.

Configuration file system


A backup of the snapshot configuration of each server is made.

TED data file system


For each backup a snapshot of the logs are made.

702881173.doc Page 34 of 41
Production and dissemination of the
Supplement to the Official Journal of the
European Union: TED website, OJS DVD-ROM
and related offline and online media

Ref: TED-SAD Software Architecture Document Version: 1.00

7 DATA VIEW
This chapter describes the persistent data view of the system. More specifically, it explains the
technical database columns and the functions required to implement version and session
management, optimistic locking and user contexts. The Object-Relational Mapping used to implement
the persistence layer is Spring JDBC.
Full information about the TED data model is available in [R02].

7.1 MYSQL CLUSTER


MySQL cluster is used for the TED databases. MySQL Cluster is a high-availability, high-redundancy
database adapted for the distributed computing environment. It uses the NDBCLUSTER storage
engine to be able to run in a cluster. A MySQL Cluster consists of a set of computers, each running a
MySQL server, a data node and a management server. MySQL cluster is used within the TED system
for documents and TED website data (also called volatile data). The relationship of these components
in a cluster is shown here:

Figure 15 Database replication with MySQL Cluster


All these elements work together to form a MySQL Cluster. When data is stored in the NDBCLUSTER
storage engine, the tables are stored in the data nodes. Such tables are directly accessible from all
other MySQL servers in the cluster. The data stored in the data nodes for MySQL Cluster is mirrored;
the cluster handles failures of individual data nodes.
The two major types of nodes are described below:

702881173.doc Page 35 of 41
Production and dissemination of the
Supplement to the Official Journal of the
European Union: TED website, OJS DVD-ROM
and related offline and online media

Ref: TED-SAD Software Architecture Document Version: 1.00

 Data node: This type of node stores cluster data. There are as many data nodes as there are
replicas, times the number of fragments. A fragment is a portion of a database table; a table is
broken up into and stored as a number of fragments. Under the NDB storage engine, each
table fragment has a number of replicas stored on other data nodes in order to provide
redundancy. The TED MySQL Cluster is configured using one fragment and four replicas
giving a total number of four data nodes.
 SQL node: This is a node that accesses the cluster data. In the case of MySQL Cluster, an
SQL node is a traditional MySQL server that uses the NDBCLUSTER storage engine.
The TED system uses one NDB node and one SQL node per back-end server that makes a total of
four NDB nodes and four SQL nodes. Each production line has one NDB management server.

7.2 TECHNICAL COLUMNS


Some database tables used to store business entities in the TED system have columns that do not
hold business data but are used only to implement specific functionalities.

7.2.1 AUDIT SEGMENT


Each table of the MySQL TED databases contains a MODIFIED_ON column and a VERSION column
for the optimistic locking and for versioning:
Column Data Type Description
MODIFIED_ON TIMESTAMP The last update date of the record
VERSION INT The version number of the record

Table 5: Modified On and Version columns


Volatile data tables also contain the MODIFIED_BY column that gives the identifier of the user who
has modified/created the entry.
Column Data Type Description
MODIFIED_BY VARCHAR The username of the user who has modified or
created the entry.

Table 6: Modified By column


The data is persisted into the database when the Spring JDBC ‘persist’ method is called.
At this time, a trigger checks that the VERSION field of the updated entity is the same that the one
stored into the database. This verification allows the system to know if a new version has overridden
the last loaded data during the session data manipulation.
If the record version we want to update is the same record version of the record stored in database,
the record is updated. A trigger is then called, which updates MODIFIED_ON and increment the
VERSION field of the record. Otherwise, it results in an exception that avoids concurrent modification
of the same entity.

702881173.doc Page 36 of 41
Production and dissemination of the
Supplement to the Official Journal of the
European Union: TED website, OJS DVD-ROM
and related offline and online media

Ref: TED-SAD Software Architecture Document Version: 1.00

8 DEPLOYMENT VIEW
The TED system runs on two distinct production lanes. One front end Load-balancer, in charge of
routing any requests to a production lane following the figure bellow:

Figure 16 TED Deployment diagram


The TED modules are deployed on different physical components. The following list contains all the
servers required for one production lane and the association between the module described previously
and the server on which they are deployed.
The frontend Web Server of each production line is composed of:
 An Apache Web Server with a Tomcat Load-balancing module, in charge of routing the HTTP
requests to a Server and serving the static resources (such as pictures, for instance);
 A ProFTPD Server which is used for the License Holder environment module;
 A James email server: The Email analysis and notifications module is deployed on this email
server.
 A MySQL Database for Data warehouse information.
The two backend servers of each production line are composed of:
 A Tomcat Application Server which hosts the TED website, the datawarehouse and the TED
Workflow management tool website.
 The workflow engine modules that run on a specific JVM.
 A copy of the indexes used for the searches performed by the Web application.
 A copy of the public RSS feeds used by the Web application.
 A MySQL Database for document and volatile data;
The NFS server is mainly in charge of hosting the content library. It uses RAID level 5 to provide a
high level of fault tolerance combined with high performance. The content library is sized to contain all
the received documents from the Office and all the subsequent documents obtained by transformation.
The Network File System (NFS) server is composed of:
 A windows XP OS via VMWare for the Adobe PDF indexes of the weekly DVD (using Adobe
professional)
 A MySQL cluster manager node. It is in charge of managing the cluster replication data
between the four instances (two on each production lane) of MySQL Cluster node.

702881173.doc Page 37 of 41
Production and dissemination of the
Supplement to the Official Journal of the
European Union: TED website, OJS DVD-ROM
and related offline and online media

Ref: TED-SAD Software Architecture Document Version: 1.00

The following schema gives an overview of the deployment of the different modules of the TED system
on the different servers present in one production lane.

Figure 17: Deployment diagram

The production of the content runs in parallel on all production lanes. The objective is to have the
information available on all production lanes. This to ensure that if one fails, the other one can

702881173.doc Page 38 of 41
Production and dissemination of the
Supplement to the Official Journal of the
European Union: TED website, OJS DVD-ROM
and related offline and online media

Ref: TED-SAD Software Architecture Document Version: 1.00

continue to serve the content. The full content library is duplicated on the NFS of each production lane.
The daily switch of the publication day is synchronized on each production lane to ensure that they
serve the same content. Initially the entire TED system is composed of two production lines.
Some processing steps are only processed on one production lane (e.g. sending of emails). If one
production line breaks, the operator executes these processes on the other line.
The load balancers dispatch the incoming requests based on the load of each server. Session
forwarding is used to keep all requests from one user to the same server.
Some document related information is replicated on the MySQL local instance of each back end
server. This task is performed by the production workflow; synchronization steps are used to update
databases of each back-end server to ensure data coherence.
Volatile database information (e.g. registered user information and document meta data) is inserted in
all database servers simultaneously with the replication mechanisms offered by MySQL cluster.
The Data warehouse information is duplicated. Each data warehouse database instance contains the
information of all production lanes and is filled with the logs of all the servers and processes.

8.1 NETWORK FILE SYSTEM SERVER


Two network file system servers are configured, one on each production lane. Each NFS server holds
the content library and the TED backup and it hosts a Windows XP via WMWare and a MySQL cluster
manager node. The result of the backup procedure is constructed and stored on each NFS, then all
this information is copied at the backup site.

8.1.1 TED REPOSITORY FILE SYSTEM


The TED repository file system has the following structure:
 /data/ted-[A||B]/ted-data/input
 /data/ted-[A||B]/ted-data/repository
 /data/ted-[A||B]/ted-data/dvd
 /data/ted-[A||B]/ted-data/license-holder
where [A|B] means production lane A or production lane B

TED REPOSITORY
Path Information

/data/ted-[A||B]/ted-data/input contains the original compressed input file


/data/ted-[A||B]/ted-data/repository contains the processed files (XML, PDF, PDF
time stamped)
/data/ted-[A||B]/ted-data/dvd contains the last daily, weekly and monthly
DVD
/data/ted-[A||B]/ted-data/license-holder contains the license holder files (UTF-8 ,and
META-XML format)

702881173.doc Page 39 of 41
Production and dissemination of the
Supplement to the Official Journal of the
European Union: TED website, OJS DVD-ROM
and related offline and online media

Ref: TED-SAD Software Architecture Document Version: 1.00

8.1.2 TED TEMPORARY BACKUP FILE SYSTEM


Each NFS holds a temporary backup file system which is built during each backup process. This is a
temporary storage of backed-up data, the content of all this folder is copied at the backup site in a
second phase.
The TED backup file system has the following structure:

TED BACKUP
Path Information

/data/ted-[A||B]/backup/daily-backup contains the daily backup files


/data/ted-[A||B]/backup/repository contains a full repository backup of /data/ted-
[A||B]/ted-data/repository
/home/backup contains backup scripts
/home/backup/log contains the backup of processing logs

8.1.3 TED MIRROR BACKUP FILE SYSTEM


This section contains a description of the filesystem present on the backup machines at the backup
site.
The TED backup file system has the following structure:

TED BACKUP
Path Information

/data/backup/Prodlane-[A|B]/daily-backup contains the mirror of the daily backup files


folder
/data/backup/Prodlane-[A|B]/repository contains the mirror of the repository backup
folder
/data/backup/Prodlane-[A|B]/repository-delta contains delta repository files
/home/backup contains synchronization backup scripts
/home/backup/log contains synchronization backup processing
log

8.1.4 WINDOWS XP VIA VMWARE


The Windows XP (via VMWare) has the only purpose to host Adobe Professional. The Adobe
Professional product is use to generate PDX (Acrobat Catalogue Index) file to be included in the
weekly DVD. The protocol samba is used between the virtual Windows and its host to share the file
system.

702881173.doc Page 40 of 41
Production and dissemination of the
Supplement to the Official Journal of the
European Union: TED website, OJS DVD-ROM
and related offline and online media

Ref: TED-SAD Software Architecture Document Version: 1.00

8.2 JAMES EMAIL SERVERS


Two James email server are installed, one on the front-end of each production line. This implies a
specific DNS configuration explained in the following point.

8.2.1 DNS CONFIGURATION


Two MX (Mail Exchanger) entries are registered on the DNS server of “ted.europa.eu”:
 mail1.ted.europa.eu and
 mail2.ted.europa.eu.
These entries ensure any SMTP requests to reach the requested mail-server (James of production
line 1 for mail1.ted.europa.eu and James of production line 2 for mail2.ted.europa.eu).
This means that the load balancing process is not triggered when accessing these mail-severs.

8.2.2 SPAM FOLDERS


The spam folders are available, using an SSH access, for browsing, downloading and burning the CDs
with the supposed spam content.

8.3 DATABASE ORGANISATION


A TED Production Lane is organised in five databases:
 On each front-end server one TED_DATAWAREHOUSE database: this is the database that
holds the data related to the data-warehouse and monitoring. There’s no replication between
the two production lanes: each TED_DATAWAREHOUSE database contains the full TED
data-warehouse data;
 On each backend server runs a single MySQL instance. On this instance, tables are created
with two distinct engines; the first one for tables that must be created using the MySQL cluster
and the second one for tables that must be create locally. TED schema contains local table
and TED cluster schema contains tables shared over the cluster.
In summary, as there are two production lanes, considering that the cluster database is shared among
all the back-end servers in all production lanes. This brings the total number of MySQL databases to
seven:
- One instance per front-end server for a total of 2 instances.
- One instance per back-end server for a total of 4 instances.
- One cluster instance shared among the back-end servers.

702881173.doc Page 41 of 41

You might also like