Professional Documents
Culture Documents
English Report
English Report
ch
University of Applied Sciences of Fribourg · European Organization of Nuclear Research
Diploma Project Bulletin 2 PDA
EIA-FR MISL CERN
Supervisors
Omar Abou Khaled (EIA-FR)
Jean Frédéric Wagen (EIA-FR)
Corrado Pettenati (CERN)
Jean-Yves Le Meur (CERN)
Roberta Faggian (CERN)
External Examiner
Christine Vanoirbeek (EPFL)
Abstract The aim of the project “Bulletin 2 PDA is to develop an information system, which
provides the CERN eBulletin on mobile devices. The main goal is to connect
different database outputs, to convert this data into personalized content and then to
output it in an appropriate format depending on the connected device. The following
paper describes the establishment of the diploma work which is in the context of
collaboration with the University of Applied Sciences of Fribourg and the European
Organization of Nuclear Research Geneva.
Keywords PDA, Palm, PocketPC, AvantGo, XML, XSL, XSL-Templates, Xalan, FOP, J2EE
Acknowledgments
I am very grateful to Mr Corrado Pettenati, Group leader of the Scientific Information Service in ETT
Division at CERN, and Mr Jean-Yves Le Meur, deputy leader of the Document Handling (DH)
service at CERN, for the proposal of this project and their support during its establishment.
I would like to thank Mr Dr. Omar Abou Khaled, professor for information system technologies, and
Mr Dr. Jean-Frédéric Wagen, professor for telecommunications, both at the University of Applied
Sciences of Fribourg (EIA-FR) at the Laboratory for Mobile Information Systems (MISL), for having
agreed to be my project supervisors. In particular, I am very grateful to Mr Dr. Omar Abou Khaled for
having given me the opportunity to appreciate the course Information System Technologies during
my studies.
A special thanks to Ms Roberta Faggian, designer of the CERN eBulletin system, for the good
collaboration, for the useful advice and for the support she gave me throughout my research.
Moreover, I would like to thank the whole CERN Document Server (CDS) group at CERN ETT/DH
for all their support and endless patience towards my questions and my thirst of knowledge.
Special thanks go to my sister Luzia Stankowski for the illustration in the introduction part and to
both Ms Muriel Plattet and Mr Omar Abou Khaled, each of whom read draft chapters and put a good
deal of time and effort into framing detailed comments and suggestions for textual improvements.
Finally, I would like to take this opportunity to thank my patient wife Rebecca for all support during
this diploma work.
Table of Contents
Chapter 1 Introduction 7
1.1 The Project.................................................................................................................7
1.1.1 Abstract........................................................................................................................7
1.1.2 Illustration....................................................................................................................7
1.2 Project Environment................................................................................................10
1.2.1 What is the CERN?....................................................................................................10
1.2.2 What is the ETT Division?.........................................................................................12
Chapter 2 Analysis 15
2.1 CERN Bulletin..........................................................................................................15
2.1.1 Introduction................................................................................................................15
2.1.2 Writing an Article?.....................................................................................................16
2.2 Analysis of the current System................................................................................22
2.2.1 Creation and Publication Processes............................................................................22
2.2.2 Output portability.......................................................................................................23
2.3 Project Specification................................................................................................27
2.3
2.2.3 Data input...................................................................................................................24
2.2.4 Data formatting and personalization...........................................................................24
2.2.5 Search engines............................................................................................................24
2.2
27
Chapter 3 Technologies 30
3.1 Personal Digital Assistant........................................................................................30
3.1.1 What is a PDA?..........................................................................................................30
3.1.2 Market overview........................................................................................................30
3.2 Connect PDAs to the WWW...................................................................................35
3.2.1 Introduction................................................................................................................35
3.2.2 Connection Possibilities.............................................................................................35
3.3 Data exchange and transformation.........................................................................38
3.3.1 eXtensible Markup Language (XML)........................................................................38
3.3.2 Document Type Definition (DTD).............................................................................39
3.4 Information System Technologies...........................................................................41
3.4.1 Apache Webserver.....................................................................................................41
3.4.2 Tomcat J2EE Container..............................................................................................42
Chapter 4 Design 45
4.1 System Architecture.................................................................................................45
4.1.1 Global View...............................................................................................................45
4.1.2 eBulletin-DTD for underlying XML-Model...............................................................46
4.2 Transformer.............................................................................................................54
4.2.1 Introduction................................................................................................................54
4.4 Filter..........................................................................................................................66
4.4.1 Why Content-filtering?..............................................................................................66
4.4.2 The Filtering-process..................................................................................................66
4.5 Formatter..................................................................................................................70
4.5.1 Introduction................................................................................................................70
4.5.2 The Formatting Process..............................................................................................70
Chapter 6 Implementation 77
6.1 Classes.......................................................................................................................77
6.1.1 Implemented Classes..................................................................................................77
6.2 Modules.....................................................................................................................79
6.2.1 Transformer................................................................................................................79
6.2.2 Connector...................................................................................................................80
6.3 Global Variables.......................................................................................................85
6.3.1 Path Information.........................................................................................................85
6.3.2 Global Variables.........................................................................................................86
6.4 Resources..................................................................................................................87
6.4.1 Default Values............................................................................................................87
6.4.2 Published Issues.........................................................................................................87
Chapter 7 Conclusion 94
7.1 Achieved Results.......................................................................................................94
7.1.1 Project Results............................................................................................................94
7.1.2 Encountered Challenges.............................................................................................95
Chapter 1
Introduction
1.1.1 Abstract
Nowadays the mobile device and wireless communication develops rapidly. Handheld mobile devices
with access to the Internet and other network applications are exploding. The business importance of
web-enabled phones and PDAs become higher and higher. Moreover, new hardware products such as
the Tablet PC and PDA/phone combinations start to push out to the market. Wireless LANs,
Bluetooth, 802.11 and other wireless technologies are rapidly evolving.
With powerful handheld device and sufficient wireless bandwidth, we can predict there is a high
demand on information and services provided for mobile device.
To follow this trend, the project “Bulletin 2 PDA” was proposed. It establishes a communication
platform for internal and external information related to CERN (European Organization of Nuclear
Research of Geneva) activities based on the CERN Bulletin, producing formatted content for mobile
devices. The system provides data collection from different database systems, content personalization
and output formatting for devices, focussing namely on mobile devices. In order to support different
formats of the different types of information, XML was chosen as an underlying data format, enabling
standardized data exchange between the involved systems.
1.1.2 Illustration
The following figure illustrates by means of everyday situations idea and utility of the new eBulletin
system:
PocketPC.
Imagine yourself
waiting for a bus
and instead of
doing nothing
checking the
official news
from the CERN
eBulletin on
your PocketPC.
Imagine
yourself waiting
for a person and
while waiting
getting informed
reading the CERN
news on your
mobile phone.
Figure 1.1 At any time, at any place, be informed with the new CERN eBulletin system.
Project Environment
Content
This chapter gives an overview about the European Organization of Nuclear Research (CERN). It
defines the group where this project took place, focussing down step by step from the whole
institution towards the project group.
Path: CERN Division ETT DH Group CDS Section.
The Laboratory provides state-of-the-art scientific facilities for researchers to use. These are
accelerators which accelerate tiny particles to a fraction under the speed of light, and detectors to
make the particles visible.
Ever since the dawn of civilization, people have endeavoured to learn more about their Universe. The
goal is simply to learn but practical benefits often come later. In the 19th Century, Michael Faraday
was asked by a sceptical member of the British government what was the use of his work on
electricity. His reply showed great foresight: One day, Sir, he said, you may tax it.
Just as Faraday was driven by the desire to know, the quest for pure knowledge at CERN drives
technology forward. CERN has given the world advances as varied as medical imaging and the
World-Wide Web. But the scientists responsible for these developments were not interested in
medicine or computers. Their motivation was simply to find out.
CERN also plays an important role in advanced technical education. A comprehensive range of
training schemes and fellowships attracts many talented young scientists and engineers to the
Laboratory. Most go on to find careers in industry, where their experience of working in a high-tech
multi-national environment is highly valued.
For this community of physicists, CERN staff designs and builds CERN's intricate machinery and
ensures its smooth operation. It helps prepare, run, analyse and interpret the complex scientific
experiments and carries out the variety of tasks required to make such a special organization
successful.
CERN employs some 2500 people, encompassing a wide range of skills and trades - physicists,
engineers, programmers, technicians, craftsmen, administrators, secretaries, workmen
Where is CERN?
CERN is on the border between France and Switzerland, just outside Geneva. Its location symbolizes
the international spirit of collaboration which is the reason for the laboratory's success.
More information about the CERN can be found at the address: http://www.cern.ch
ETT is responsible for demonstrating and communicating to social groups and society at large, in co-
operation with the collaborating institutes, the scientific results achieved by the CERN programme,
with their cultural and educational implications, as well as the technologies and methods developed in
the accomplishment of CERN's basic mission.
Chapter 2
Analysis
2.1.1 Introduction
- Text provided by Roberta Faggian, CERN -
CERN, as well as many other organizations, regularly publishes an internal journal which contains
news about the most important projects and facts at CERN and talks about the life of the organization
itself (regulation, clubs activity, etc.). The CERN bulletin, this is the name of the journal, is released
weekly in paper and electronic version.
Every Thursday, after the final checking of the publication, the PrintShop starts printing the first
copies of the new issue of the CERN bulletin. At the same time, a specific person starts submitting all
the articles in electronic version into the database. Another person, from the CERN Press Office,
checks the articles in order to add some information (e.g. links to additional documentation) and
corrects any possible typing mistake.
From time to time we also add new features to the electronic version of the journal. The last one is the
production of a video news magazine which shows some interviews and images about the hot topics
of the last month. It seems to be quite appreciated by the public.
The CERN bulletin journal is available to the public at the following address: http://bulletin.cern.ch.
On the Web, the Official News are hidden for people who connect from outside CERN.
Texts (Word format) and pictures (pict, tiff, jpeg et eps) must be in a separate file.
Photos furnished by the clubs to illustrate their articles are welcome.
If you have questions or comments about the Web site contact: Bulletin-Support@listbox.cern.ch
Creation process
In the creation process of the eBulletin, data is collected from different databases, useful data is
extracted and then merged to complete information. The result of this merge is then saved or provided
for on-the-fly processing.
Publication process
In the publication process, the before stored data is processed to the final eBulletin output. Data
previously stored on a device gets read and processed, probably personalized, and finally formatted
depending on the connected device.
Mobile/PDA Mobile/PDA
?
eBulletin WEB eBulletin WEB
System System
?
Printing/PDF Printing/PDF
Application
Figure 2.9 Current eBulletin output formats Figure 2.10 Current eBulletin output formats
With the current system, output to different devices is not possible and interaction with other systems
cannot be provided. Because of both, non-modularity and hardcoded output, the new system is limited
to the actual output format.
Project Specification
Content
This chapter provides an overview of the project specification process during the first weeks of the
project. The out of this process developed project, requirement specifications finalize the analysis part
of the project.
2.1.5 Objectives
The main objective of the project is to develop a feasibility study for CERN Bulletins broadcasting to
mobile devices (PDA, mobile phones etc.) with an approach based on new technologies. This
technologies offer great possibilities, showed here below on two levels:
Technological Level
New technologies offer great advantages:
Reusability of application code
Enhancement of application development cycle
Simplification of product support
Extensibility for future mobile device support
Conceptual Level
Several advanced functions can be realized thanks these new technologies:
Workflow improvement based on the actual two sources (ALEPH and PHOTO databases)
Minimization of data duplication
Usage of a structured data exchange format (XML) which standardizes different data
source inputs and application output
Access to information without locality or domain constraints
Synchronization of personal data with existing authentication system (Windows, Unix) using
LDAP (Lightweight Directory Access Protocol), currently there are no user profiles
Improvement of information broadcasting for mobile devices (PDA, mobile phones)
Chapter 3
Technologies
Basic PDAs allow you to store and retrieve addresses and phone numbers, maintain a calendar, and
create to-do lists and notes. More sophisticated PDAs can run word processing, spreadsheet, money
manager, games and electronic book reading programs and also provide email and Internet access.
Some PDAs come with all of the programs included. For others, you have to acquire or purchase extra
software to run these programs. Some PDAs play stereo quality music and record voice memos, while
some others do so with additional hardware.
Most PDAs can exchange information with a desktop or laptop computer, although you may have to
buy additional accessories.
3.1.3 Introduction
Currently, the ability to access the content of the Internet through a PDA is more limited than through
a desktop computer. Some Internet features that are available to most desktop computer users may not
be available to PDA users. For example, PDAs may not allow users to play certain games, use certain
audio or video features, or view information in certain formats like PDF (Portable Document Format)
files. In addition, there is limited support for multimedia programs available on some Web sites.
Many PDAs allow access to e-mail accounts, but some PDAs limit the ability to send, receive, or
view e-mail attachments. Not all devices are able to display attachments in popular formats like MS
Word, Adobe PDF, and HTML without additional software.
Browsers connected to a service (e.g. Blazer etc.): They are connected to a service which first
parses a called web site, cuts out unreadable syntax and then sends the site either in
limited HTML or in a special format to the application on the PDA.
These browsers are very useful for the connection to the eBulletin, so a limited HTML output
should be provided.
Offline by synchronization
Actually there are a lot of synchronization services available for PDA users, a well known is the
“AvantGo” service. A user subscribes to so called channels which provide news content for every
type of interest. Synchronization software has to be installed on a PC to which the PDA connects over
a serial or wireless connection. Every time the user syncs his PDA, the PC connects to the
synchronization service which itself connects to the channel content deliverer. Content is verified by
the service provider and useless tags are cut off to keep the amount of data small. This data is sent to
the client software on the PC which delivers the PDA with the updates. This method has the
advantage to deliver data very quickly because the synchronization is made via a fast connection and
stored in the memory.
Connection
PDA to AvantGo
Sync
PDA Client PC AvantGo Server eBulletin Server
Channel
Update
Data
PDA Update Delivery
The XML specification defines a standard way of representing structured data. The data is represented
(mostly) as elements and attributes.
<Weather>
<City>
<Name>London</Name>
<Temperature Units=
"Fahrenheit">72</Temperature>
<Temperature Units=
"Centigrade">25</Temperature>
</City>
<City>
<Name>LA</Name>
<Temperature Units=
"Centigrade">25</Temperature>
</City>
</Weather>
The Apache HTTP Server Project is an effort to develop and maintain an open-source HTTP server
for various modern desktop and server operating systems, such as UNIX and Windows NT. The goal
of this project is to provide a secure, efficient and extensible server which provides HTTP services in
sync with the current HTTP standards.
Tomcat is developed in an open and participatory environment and released under the Apache
Software License. Tomcat is intended to be a collaboration of the best-of-breed developers from
around the world.
Chapter 4
Design
Transformer
Data is gathered from different databases, normally providing XML as output. The Seminars-
database, however, does not provide XML output, so a transformation module has to be applied
between the databases output and the input to the connector.
Connector
The goal of the connector is to merge XML output of the databases to a single XML document (XML
input). This procedure implies an XML-scheme transformation from the scheme of the databases to
the scheme of the eBulletin system which is specified in the eBulletin-DTD. The built XML-file
contains all data available for one eBulletin issue, including fulltexts and titles in different languages.
Filter
The filter transforms the XML Input-File into a personalized XML-File according to given
parameters. These parameters concern language, output device information, output format, client
location etc.
Formatter
The formatter transforms the personalized XML-File into the requested output format respecting the
output capabilities of the device. Depending on the requested language, the according dictionary,
which contains label and message information, is joined.
The figure below gives a global view of the entire eBulletin system:
Databases Transformer Connector XML (1) Filter XML (2) Formatter Output
XML
Params Dictionary
HTML
Articles
Rel. XML
WAP
Seminars
XML
Saveable PDF/PS
XML
Photo
Metadata
XML
Looking at the current eBulletin a first approach has to be made which divides the information in
different categories:
News Articles
Official News
Pension Fund
Seminars
General Information
Staff Association
Social and Culture Events
Extras
Then these categories become refined. In fact, when considering these categories it shows that all of
them contain articles, with the exception of the category seminars which contains seminar
information. An article can be split into the following fields:
Title
Summary
Subject
Fulltext
Person (Author)
Issue number
Article order
Links
etc.
Seminar information is specified through a few common and some other fields:
Title
Summary
Subject
Person (Speaker)
Day
Hour
Location
Links
etc.
From this splitting into different fields, three approaches for an XML-model (DTD) were worked out.
It is important to mention that the construction of these approaches was done with the emphasis to
reflect the structure of the eBulletin and not to take into consideration the structure of data received
from the databases. The following reasons lead to such a conviction:
An adapted XML-model keeps processing as simple as possible in all internal modules.
Using an own XML-model guarantees a greater database independency than reusing the
model of the database.
Transformer
Content
This chapter defines the rules for the transformer-module which transforms seminar information (non-
XML-data) into a specified XML-model for further use in the Input-XML file.
4.1.3 Introduction
As XML is used as data-exchange format inside the application and also for the communication with
other systems, non XML-data must be converted into XML. As the seminars-database does not
provide an XML-output, its mainly its output which must be converted into XML. Therefore IBM
Alphaworks provides a tool called XML Access Service Lightweight Extractor (XLE) which matches
our needs best: it transforms a result-set of a relational database into XML. See chapter Error:
Reference source not found for more information about the product.
Let us have a look at an example. You can use a simple DTD to create a new DTDSA. For example
lets suppose that you use the following DTD:
<!DOCTYPE ListOfSeminars [
<!ELEMENT ListOfSeminars (seminar)* >
<!ELEMENT seminar (#PCDATA) >
]>
Figure 4.18 A sample XML output using the DTDSA-file in Error: Reference source not found
As you can see the transformation of non-XML-data into XML becomes very simple using this
module. For further questions about functions and declarations I refer to the documentation that
comes with the XLE package.
Please note, that this kind of transformation is not only very simple, but is much faster adaptable to
database changes, too, because it works on a declaration level.
For the eBulletin system I used a DTDSA which matches exactly the format of a report node-set of
the eBulletin-DTD in order to be able to copy the seminars part directly into the XML input-file
(avoiding further XML-transformation). The DTDSA used is the following:
<!DOCTYPE Seminars [
<!ELEMENT Seminars (Report* :: r := SQL("SELECT DISTINCT * FROM TALK left join
AGENDA on TALK.ida=AGENDA.id left join LEVEL as LEVEL1 on AGENDA.fid=LEVEL1.uid
left join LEVEL as LEVEL2 on LEVEL1.fid=LEVEL2.uid where (TO_DAYS(TALK.tday)
BETWEEN TO_DAYS('$in0') and TO_DAYS('$in1')) and (AGENDA.fid = '2l20' or
LEVEL1.fid = '2l20' or LEVEL2.fid = '2l20') order by TALK.tday, TALK.stime")) >
Connector
Content
The following chapters describe the design of the data input-module which connects the databases to
the new eBulletin system.
The answer is XML. Designed for data exchange in a machine- and human readable structured
format, well standardized and supported, XML offers great advantages. Once transformed to XML the
raw data becomes organized and can be processed to various other formats.
The Input XML-File serves to provide raw eBulletin data in a standardized and structured data format.
This approach allows simple accessing, modifying and formatting. It is constructed out of the result of
database requests (articles and seminars), and dictionary files.
This XML-File reflects quasi the total information contained in the actual issue of the eBulletin. It can
be pre-processed for faster reply and stored intermediately or for longer terms in order to save a
complete copy of the eBulletin in a well exchangeable format.
Filter
Content
This chapter provides design information about the filter module of the eBulletin system.
The system provides output formatting for different devices. For each output, a new output-formatter
is needed. Some parameters are common for each of these formatters, e.g. if the user selects French as
output language, each output will be in the French version of the eBulletin. In order to avoid
implementation of parameter processing in each single formatter, general parameters such as
language, access restrictions etc. are already applied in the filtering-module. The filter takes out
unusable or forbidden content of the XML-file, so there is no need to care about in the formatter-
module.
Language
Some elements in the eBulletin XML-model are provided in both languages English and French. In
order to unify and simplify processing of such elements in the formatters, unused language fields are
filtered out.
Language XML
Parameter
Filter XML-file
no
elt=t?s?f?
yes
yes
attr=lang?
no
Access restrictions
Persons out of CERN are not allowed to see the category Official News . The current eBulletin
system checks the IP address of the connecting machine and if it is not in the range of the network of
CERN it takes out the Official News. However, this technique is not the best solution, because
people working at CERN but connecting from a PC from outside of CERN cannot read the Official
News.
Beside of simply checking the IP-Address, user profiles could be used, which would not only allow
showing hidden content to registered users but would also allow personalized content. Unfortunately,
the problem at CERN is that there is no common login , meaning a server which can store user
profiles for different applications and control access restriction. Therefore, new profiles must be
stored in the application itself and a new login/password pair must be created for each user a
complicated and not very user-friendly way. There is a running project at CERN which is dedicated to
provide common login services, so in the future, profiles could be used to personalize content.
The different methods are listed here below:
Normal Web access: IP-address checking, in the future profiles could be used
Mobile devices: IP-address checking and inscription into cookies (for browsers which provide
cookies)
AvantGo service: Right checking at subscription level and storage together with the
subscription data
For more information about this topic, please consult the implementation part of this report.
Personalized content
Everybody using a mobile device has the possibility to personalize the eBulletin to match best
individual interests. For the moment, images and specific categories can be taken off, which is very
useful especially for users with limited memory capacity on PDAs. Moreover, personalization let hide
non-relevant information and so helps saving time. (People like personalization, if you do not believe
it, have a look at the different desktop backgrounds, mobile phone covers etc.)
Personalization XML
Parameters
Filter XML-file
no no
hide categ? hide img?
yes yes
External Links
External links are often provided in eBulletin articles and seminar information. Normally this is very
useful; users can get more information about a topic following such a link. However, when
synchronizing eBulletin content to a mobile device for offline reading, external links could influent
the synchronization process in a negative manner: huge external pages can lead to delayed
synchronization or even to buffer overflow in the mobile device, so that eBulletin content could not be
saved properly. In order to avoid such behaviour, articles and seminar information are parsed for
external links and cut off when providing information for a synchronization service.
Formatter
Content
In this chapter I will give an overview of the formatting process and the provided output formats.
4.1.8 Introduction
In the current eBulletin system, data is not stored separately but only within a HTML formatting
context. As a matter of fact, it can only be provided in this format and not be easily converted into
another format, e.g. a format for mobile devices. With the usage of XML as data model and XSL as
formatting language, the new eBulletin system can now provide not only HTML output but much
more different formats, such as PDA-friendly HTML, WML, PDF etc. The creation of such output
formats is independent of the data gathering and transformation processes and happens in the last
module of the whole system. The advantage is that future extension is easily possible and even the
realization of an interface to another application is now possible.
Formatter XSLT
Dictionary
XML
Output
Filtered XML
Formatter
the language parameter set. The output is then created using the respective XSLT-file to the
demanded format.
For each different output format, another XSLT-file is used. This file contains information about the
output format properties and on how XML content can be brought into this format. This kind of
policy is read and put into action by the formatter.
Chapter 5
Advanced Design
5.1.1 Introduction
In the eBulletin application, several modules must be chained in order to get the requested output.
Xalan provides functionality called pipelining which is more powerful as simple chaining of XSL
transformations because it provides streamed in- and output. This means that while output of a first
transformation is still generated, a second transformation can already start.
Streamed Output
Stylesheet Compilation
Content
This chapter provides information about what stylesheet compilation is and how it can be used to
accelerate the response time of the eBulletin system.
5.1.3 Introduction
XSLT is a programming language, expressed using XML syntax. This is not for the benefit of the
computer, but rather for human interpretation. Before the stylesheet can be processed, it must be
converted into some internal machine-readable format. This process should sound familiar, because it
is the same process used for every high-level programming language. The programmer works in terms
of the high-level language, and an interpreter or compiler converts this language into some machine
format that can be executed by the computer.
A better approach is to parse the XSLT stylesheet into memory once, compile it to machine-format,
and then preserve that machine representation in memory for repeated use. This is called stylesheet
compilation and is not different in concept than the compilation of any programming language.
Chapter 6
Implementation
6.1 Classes
Content
In the previous chapters we have covered all facilities of system design. Together these provide an
overview of the whole system definition. Let us now have a look at the implementation of the system.
The following lines will cover the characteristics of each of these types.
Servlets
The Servlets are the heart of the eBulletin application. They process HTTP requests incoming to the
application, transform parameters to be usable for further stylesheet processing, invoke stylesheet
processors and interact with the utility classes and external class libraries, and finally response
transformed data to the client.
Utilities
Modules
Content
This chapter provides an overview of the implementation of the different modules mentioned in the
design part of this report. All the modules are implemented using XML/XSL transformation
(exception Transformer module) in order to keep them easy to understand and well modifiable.
6.1.2 Transformer
The transformer module transforms a result set from a relational database into XML. Remember the
transformer uses a library called XLE to do this transformation. It is easy to understand that no XSL
could be applied, because the input data is not XML. However, the implementation could be kept on a
high abstraction level, because XLE allows process definition by means of an extended DTD, called
DTDSA (see Desgin part of this report for more information about this file).
Besides the definition of the DTDSA, the XLE transformer has to be configured for database access
using the configuration file access.cfg. It specifies access to the relational database (with username
and password) and the driver. It looks as following:
org.gjt.mm.mysql.Driver
jdbc:mysql://cdsdb.cern.ch/AGE?user=[username]&password=[password]
The seminar information comes from the agenda database. Unfortunately, some information is given
only in HTML, so a special conversion must be made in order to keep the generated XML file valid.
Especially all HTML tags must be transformed into text which can be done by replacing all < by
<. Therefore, a new class must be implemented and added to the XLE library, because no
replace command is available for XLE transformation. Have a look at the implemented class:
/**
* Title: Bulletin2PDA
* Description: Diploma Project
* Copyright: Copyright (c) 2002
* Company: EIA-FR/CERN
* @author Dominik Stankowski
* @version 1.0
*/
import java.io.*;
import java.util.*;
import java.lang.StringBuffer;
import java.util.StringTokenizer;
Figure 6.27 Extension for the XLE library (replace characters during the transformation process)
This file was added to the XLE library which was renamed to XLE_ds.jar .
6.1.3 Connector
The connector module joins the provided XML data from different sources and transforms it into the
XML model of the eBulletin application using an XML/XSL transformation. There are two different
data sources, article information in MARC-21 XML format and converted seminar data in XML
format.
Article information has to be extracted from the MARC-21 format and put into the new model.
Therefore the xsl:key functions is used:
<xsl:key name="ArticleSet" match="marc:collection/marc:record"
use="translate(marc:datafield[@tag='650']/marc:subfield[@code='a'],
'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz')" />
Seminar data is already brought into the right XML model during transformation from relational data
to XML. This technique allows minimizing XSL transformations in the connector module. Therefore
a simple xsl:copy command can be used:
<xsl:copy-of select="/" />
Global Variables
Content
This chapter gives an overview on how path information and global variables are stored in the
Bulletin 2 PDA application.
The following paragraphs show some examples of path information stored in the web.xml file.
These examples showed how the different parameters are stored in the web.xml file
Resources
Content
In this chapter I will explain the use of default values and dictionaries in the Bulletin 2 PDA
application.
The file is stored in XML format that is why it remains easily configurable:
<?xml version="1.0" encoding="ISO-8859-1" ?>
<Preferences>
<Default>
<Lang>eng</Lang>
<Issue>latest</Issue>
<Out>response</Out>
<Img>show</Img>
<Int>false</Int>
<Dev>simple</Dev>
</Default>
</Preferences>
Figure 6.33 Default values are specified in the "default.xml" file
latest can be specified for the parameter issue (see example above). The respective files are
showed in the following figures:
The Application
Content
This chapter gives some example outputs of the eBulletin on different devices.
Figure 6.35 CERN eBulletin Mainpage Figure 6.36 Category News Articles
Figure 6.37 A News Article (with image) Figure 6.38 A News Article (without image)
Chapter 7
Conclusion
The CERN: it is neither an enterprise nor a university but a scientific organization with
international character :-)
The current eBulletin system was not reusable for broadcasting on mobile devices; so a
completely new system had to be designed.
Analysis of a current application and work out of project requirement specifications.
Data sources were not all available in XML. So a converter had to be found in order to
transform relational data into XML.
The running systems at CDS were not very well documented, so a lot of meetings had to be
held in order to get information about these systems. Already designed parts of the new
system had to be redesigned in order to adapt the system to unreported functions or to
respect environment specific definitions.
Article and seminar information were specified very restrictive, so a lot of cases were
considered during design and implementation of the system.
Sometimes article and seminar information were only provided in HTML (fulltext, summary
and title), so a lot of painful transformations had to be made in order to get data
completely without any formatting or to do modifications without affecting HTML tags.
Moreover, image links were not provided as separate information in the XML-file, they had
to be extracted from the fulltext. Furthermore, images were not always provided in an
icon format.
These are only some of the encountered challenges and it goes without saying that I learned a lot of
new things out of them which enforced heavily my personal progress.
These are a few points that should be respected in order to create a stable release.
Issues of the eBulletin could be floating; this means that content would be provided for the
eBulletin at real submission time. An artificial delay between submission time and
publication would be cancelled (today eBulletin articles are not published before the
paper versions issue).
MIDlets could bring extended functions to the eBulletin such as transfer of seminar date and
time information into the calendar database of the PDA.
These points should be verified seriously for future development on the eBulletin system.