Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 61

Dominik Stankowski · dominik@stankowski.

ch
University of Applied Sciences of Fribourg · European Organization of Nuclear Research
Diploma Project “Bulletin 2 PDA”
EIA-FR MISL CERN

in collaboration with the


European Organization of Nuclear Research Geneva (CERN)

Supervisors
Omar Abou Khaled (EIA-FR)
Jean Frédéric Wagen (EIA-FR)
Corrado Pettenati (CERN)
Jean-Yves Le Meur (CERN)
Roberta Faggian (CERN)

External Examiner
Christine Vanoirbeek (EPFL)

Abstract The aim of the project “Bulletin 2 PDA” is to develop an information system, which
provides the CERN eBulletin on mobile devices. The main goal is to connect
different database outputs, to convert this data into personalized content and then to
output it in an appropriate format depending on the connected device. The following
paper describes the establishment of the diploma work which is in the context of
collaboration with the University of Applied Sciences of Fribourg and the European
Organization of Nuclear Research Geneva.

Keywords PDA, Palm, PocketPC, AvantGo, XML, XSL, XSL-Templates, Xalan, FOP, J2EE

Dominik Stankowski · dominik@stankowski.ch 2


University of Applied Sciences of Fribourg · European Organization of Nuclear Research
Diploma Project “Bulletin 2 PDA”
EIA-FR MISL CERN

Acknowledgments
I am very grateful to Mr Corrado Pettenati, Group leader of the Scientific Information Service in ETT
Division at CERN, and Mr Jean-Yves Le Meur, deputy leader of the Document Handling (DH)
service at CERN, for the proposal of this project and their support during its establishment.

I would like to thank Mr Dr. Omar Abou Khaled, professor for information system technologies, and
Mr Dr. Jean-Frédéric Wagen, professor for telecommunications, both at the University of Applied
Sciences of Fribourg (EIA-FR) at the Laboratory for Mobile Information Systems (MISL), for having
agreed to be my project supervisors. In particular, I am very grateful to Mr Dr. Omar Abou Khaled for
having given me the opportunity to appreciate the course “Information System Technologies ” during
my studies.

A special thanks to Ms Roberta Faggian, designer of the CERN eBulletin system, for the good
collaboration, for the useful advice and for the support she gave me throughout my research.

Moreover, I would like to thank the whole CERN Document Server (CDS) group at CERN ETT/DH
for all their support and endless patience towards my questions and my thirst of knowledge.

Special thanks go to my sister Luzia Stankowski for the illustration in the introduction part and to
both Ms Muriel Plattet and Mr Omar Abou Khaled, each of whom read draft chapters and put a good
deal of time and effort into framing detailed comments and suggestions for textual improvements.

Finally, I would like to take this opportunity to thank my patient wife Rebecca for all support during
this diploma work.

Dominik Stankowski · dominik@stankowski.ch 3


University of Applied Sciences of Fribourg · European Organization of Nuclear Research
Diploma Project “Bulletin 2 PDA”
EIA-FR MISL CERN

Table of Contents
Chapter 1 Introduction 7
1.1 The Project.................................................................................................................7
1.1.1 Abstract........................................................................................................................7
1.1.2 Illustration....................................................................................................................7
1.2 Project Environment................................................................................................10
1.2.1 What is the CERN?....................................................................................................10
1.2.2 What is the ETT Division?.........................................................................................12

Chapter 2 Analysis 15
2.1 CERN Bulletin..........................................................................................................15
2.1.1 Introduction................................................................................................................15
2.1.2 Writing an Article?.....................................................................................................16
2.2 Analysis of the current System................................................................................22
2.2.1 Creation and Publication Processes............................................................................22
2.2.2 Output portability.......................................................................................................23
2.3 Project Specification................................................................................................27
2.3
2.2.3 Data input...................................................................................................................24
2.2.4 Data formatting and personalization...........................................................................24
2.2.5 Search engines............................................................................................................24
2.2
27

Chapter 3 Technologies 30
3.1 Personal Digital Assistant........................................................................................30
3.1.1 What is a PDA?..........................................................................................................30
3.1.2 Market overview........................................................................................................30
3.2 Connect PDAs to the WWW...................................................................................35
3.2.1 Introduction................................................................................................................35
3.2.2 Connection Possibilities.............................................................................................35
3.3 Data exchange and transformation.........................................................................38
3.3.1 eXtensible Markup Language (XML)........................................................................38
3.3.2 Document Type Definition (DTD).............................................................................39
3.4 Information System Technologies...........................................................................41
3.4.1 Apache Webserver.....................................................................................................41
3.4.2 Tomcat J2EE Container..............................................................................................42

Chapter 4 Design 45
4.1 System Architecture.................................................................................................45
4.1.1 Global View...............................................................................................................45
4.1.2 eBulletin-DTD for underlying XML-Model...............................................................46
4.2 Transformer.............................................................................................................54
4.2.1 Introduction................................................................................................................54

Dominik Stankowski · dominik@stankowski.ch 4


University of Applied Sciences of Fribourg · European Organization of Nuclear Research
Diploma Project “Bulletin 2 PDA”
EIA-FR MISL CERN

4.2.2 The XLE Transformer................................................................................................54


4.3 Connector..................................................................................................................60
4.3.1 Data Exchange between Systems...............................................................................60
4.3.2 The CDS Search.........................................................................................................60

4.4 Filter..........................................................................................................................66
4.4.1 Why Content-filtering?..............................................................................................66
4.4.2 The Filtering-process..................................................................................................66
4.5 Formatter..................................................................................................................70
4.5.1 Introduction................................................................................................................70
4.5.2 The Formatting Process..............................................................................................70

Chapter 5 Advanced Design 72


5.1 XSL Pipelining..........................................................................................................72
5.1.1 Introduction................................................................................................................72
5.1.2 Pipelining for the eBulletin........................................................................................72
5.2 Stylesheet Compilation.............................................................................................74
5.2.1 Introduction................................................................................................................74
5.2.2 Templates API............................................................................................................74

Chapter 6 Implementation 77
6.1 Classes.......................................................................................................................77
6.1.1 Implemented Classes..................................................................................................77
6.2 Modules.....................................................................................................................79
6.2.1 Transformer................................................................................................................79
6.2.2 Connector...................................................................................................................80
6.3 Global Variables.......................................................................................................85
6.3.1 Path Information.........................................................................................................85
6.3.2 Global Variables.........................................................................................................86
6.4 Resources..................................................................................................................87
6.4.1 Default Values............................................................................................................87
6.4.2 Published Issues.........................................................................................................87

6.5 The Application........................................................................................................90


6.5.1 Output in HTML Format for PDAs............................................................................90
6.5.2 Output in simple HTML Format................................................................................92

Chapter 7 Conclusion 94
7.1 Achieved Results.......................................................................................................94
7.1.1 Project Results............................................................................................................94
7.1.2 Encountered Challenges.............................................................................................95

7.2 eBulletin System Improvements..............................................................................96


7.2.1 System Improvements................................................................................................96
7.3 A Glimpse into the Future.......................................................................................98
7.3.1 Work to do on the new eBulletin System...................................................................98

Dominik Stankowski · dominik@stankowski.ch 5


University of Applied Sciences of Fribourg · European Organization of Nuclear Research
Diploma Project “Bulletin 2 PDA”
EIA-FR MISL CERN

7.3.2 Possible Considerations for the Future.......................................................................98

Dominik Stankowski · dominik@stankowski.ch 6


University of Applied Sciences of Fribourg · European Organization of Nuclear Research
Diploma Project “Bulletin 2 PDA”
EIA-FR MISL CERN

Chapter 1

Introduction

1.1 The Project


Content
This chapter gives a brief overview of the project, the context of main objectives, user needs, and
working environment.

1.1.1 Abstract
Nowadays the mobile device and wireless communication develops rapidly. Handheld mobile devices
with access to the Internet and other network applications are exploding. The business importance of
web-enabled phones and PDAs become higher and higher. Moreover, new hardware products such as
the Tablet PC and PDA/phone combinations start to push out to the market. Wireless LANs,
Bluetooth, 802.11 and other wireless technologies are rapidly evolving.

With powerful handheld device and sufficient wireless bandwidth, we can predict there is a high
demand on information and services provided for mobile device.

To follow this trend, the project “Bulletin 2 PDA” was proposed. It establishes a communication
platform for internal and external information related to CERN (European Organization of Nuclear
Research of Geneva) activities based on the CERN Bulletin, producing formatted content for mobile
devices. The system provides data collection from different database systems, content personalization
and output formatting for devices, focussing namely on mobile devices. In order to support different
formats of the different types of information, XML was chosen as an underlying data format, enabling
standardized data exchange between the involved systems.

1.1.2 Illustration
The following figure illustrates by means of everyday situations idea and utility of the new eBulletin
system:

Dominik Stankowski · dominik@stankowski.ch 7


University of Applied Sciences of Fribourg · European Organization of Nuclear Research
Diploma Project “Bulletin 2 PDA”
EIA-FR MISL CERN

 Imagine somebody asks you where a seminar takes


place and instead of having no answer – consult your
personalized Seminars information.

PocketPC.


Imagine yourself
waiting for a bus
and instead of
doing nothing –
checking the
official news
from the CERN
eBulletin on
your PocketPC.

Bulletin 2 PDA  Imagine yourself


queuing up in front of a desk and
instead of wasting your precious
time – reading the CERN
eBulletin on your Palm.

 Imagine
yourself waiting
for a person and
while waiting –
getting informed
reading the CERN
news on your
mobile phone.

Figure 1.1 At any time, at any place, be informed with the new CERN eBulletin system.

Dominik Stankowski · dominik@stankowski.ch 8


University of Applied Sciences of Fribourg · European Organization of Nuclear Research
Diploma Project “Bulletin 2 PDA”
EIA-FR MISL CERN

Project Environment
Content
This chapter gives an overview about the European Organization of Nuclear Research (CERN). It
defines the group where this project took place, focussing down step by step from the whole
institution towards the project group.
Path: CERN – Division ETT – DH Group – CDS Section.

1.1.3 What is the CERN?


CERN is the European Organization for Nuclear Research, the world's largest particle physics centre.
Founded in 1954, the laboratory was one of Europe's first joint ventures, and has become a shining
example of international collaboration. From the original 12 signatories of the CERN convention,
membership has grown to the present 20 Member States. CERN explores what matter is made of, and
what forces hold it together.

The Laboratory provides state-of-the-art scientific facilities for researchers to use. These are
accelerators which accelerate tiny particles to a fraction under the speed of light, and detectors to
make the particles visible.

Ever since the dawn of civilization, people have endeavoured to learn more about their Universe. The
goal is simply to learn but practical benefits often come later. In the 19th Century, Michael Faraday
was asked by a sceptical member of the British government what was the use of his work on
electricity. His reply showed great foresight: “One day, Sir, ” he said, “you may tax it. ”

Just as Faraday was driven by the desire to know, the quest for pure knowledge at CERN drives
technology forward. CERN has given the world advances as varied as medical imaging and the
World-Wide Web. But the scientists responsible for these developments were not interested in
medicine or computers. Their motivation was simply to find out.

CERN also plays an important role in advanced technical education. A comprehensive range of
training schemes and fellowships attracts many talented young scientists and engineers to the
Laboratory. Most go on to find careers in industry, where their experience of working in a high-tech
multi-national environment is highly valued.

Dominik Stankowski · dominik@stankowski.ch 9


University of Applied Sciences of Fribourg · European Organization of Nuclear Research
Diploma Project “Bulletin 2 PDA”
EIA-FR MISL CERN

Figure 1.2 CERN, Electron-Accelerator-Ring Figure 1.3 CERN, Detector

Who works at CERN?


Some 6500 scientists, half of the world's particle physicists, come to CERN for their research. They
represent 500 universities and over 80 nationalities.

For this community of physicists, CERN staff designs and builds CERN's intricate machinery and
ensures its smooth operation. It helps prepare, run, analyse and interpret the complex scientific
experiments and carries out the variety of tasks required to make such a special organization
successful.

CERN employs some 2500 people, encompassing a wide range of skills and trades - physicists,
engineers, programmers, technicians, craftsmen, administrators, secretaries, workmen …

Where is CERN?
CERN is on the border between France and Switzerland, just outside Geneva. Its location symbolizes
the international spirit of collaboration which is the reason for the laboratory's success.

Where the WEB was born


In late 1990, Tim Berners-Lee, a CERN computer scientist invented the World Wide Web. The
"Web" as it is affectionately called, was originally conceived and developed for the large high-energy
physics collaborations which have a demand for instantaneous information sharing between physicists
working in different universities and institutes all over the world. Now it has millions of academic and
commercial users.
Tim together with Robert Cailliau wrote the first WWW client (a browser-editor running under
NeXTStep) and the first WWW server along with most of the communications software, defining
URLs, HTTP and HTML. In December 1993 WWW received the IMA award and in 1995 Tim and
Robert shared the Association for Computing (ACM) Software System Award for developing the
World-Wide Web with M. Andreessen and E. Bina of NCSA.

More information about the CERN can be found at the address: http://www.cern.ch

Dominik Stankowski · dominik@stankowski.ch 10


University of Applied Sciences of Fribourg · European Organization of Nuclear Research
Diploma Project “Bulletin 2 PDA”
EIA-FR MISL CERN

1.1.4 What is the ETT Division?


ETT = Education and Technology Transfer Division
Website: http://cern.web.cern.ch/CERN/Divisions/ETT

CERN has 16 different divisions which are:


AC Directorate of Accelerators
AS Administrative Support
DSU Directorate Service Unit
EP Experimental Physics
EST Engineering Support and Technologies
ETT Education and Technology Transfer
FI Finance
HR Human Resources
IT Information Technology
LHC Large Hadron Collider
PS Proton Synchrotron
SL SPS+LHC
SPL Supplies, Procurement and Logistics
ST Technical Support
TH Theory
TIS Technical Inspection and Safety

ETT is responsible for demonstrating and communicating to social groups and society at large, in co-
operation with the collaborating institutes, the scientific results achieved by the CERN programme,
with their cultural and educational implications, as well as the technologies and methods developed in
the accomplishment of CERN's basic mission.

Figure 1.4 Structure of the ETT-Division

Dominik Stankowski · dominik@stankowski.ch 11


University of Applied Sciences of Fribourg · European Organization of Nuclear Research
Diploma Project “Bulletin 2 PDA”
EIA-FR MISL CERN

Figure 1.5 Location of the CDS Section at CERN

Dominik Stankowski · dominik@stankowski.ch 12


University of Applied Sciences of Fribourg · European Organization of Nuclear Research
Diploma Project “Bulletin 2 PDA”
EIA-FR MISL CERN

Chapter 2

Analysis

2.1 CERN Bulletin


Content
This chapter provides an overview of the current CERN Bulletin system. Workflow specifications are
given in detail, however, to understand the conception of my project, it is not necessary to understand
each of these steps, but they let you feel the complexity of the whole system.

2.1.1 Introduction
- Text provided by Roberta Faggian, CERN -

CERN, as well as many other organizations, regularly publishes an internal journal which contains
news about the most important projects and facts at CERN and talks about the life of the organization
itself (regulation, clubs’ activity, etc.). The CERN bulletin, this is the name of the journal, is released
weekly in paper and electronic version.

Every Thursday, after the final checking of the publication, the PrintShop starts printing the first
copies of the new issue of the CERN bulletin. At the same time, a specific person starts submitting all
the articles in electronic version into the database. Another person, from the CERN Press Office,
checks the articles in order to add some information (e.g. links to additional documentation) and
corrects any possible typing mistake.

From time to time we also add new features to the electronic version of the journal. The last one is the
production of a video news magazine which shows some interviews and images about the “hot topics ”
of the last month. It seems to be quite appreciated by the public.

The CERN bulletin journal is available to the public at the following address: http://bulletin.cern.ch.
On the Web, the Official News are hidden for people who connect from outside CERN.

Dominik Stankowski · dominik@stankowski.ch 13


University of Applied Sciences of Fribourg · European Organization of Nuclear Research
Diploma Project “Bulletin 2 PDA”
EIA-FR MISL CERN

2.1.2 Writing an Article?


If a person wishes to publish information in the Weekly Bulletin or has an idea for an article, the
following structure should be followed:

An idea for an article on the first pages?


You can send your suggestion by electronic mail at the following address:
Bulletin-Editors@listbox.cern.ch
Tel. 79971
A seminar announcement or general information?
The official news, general information or seminar announcements must be sent before Tuesday 12.00 to:
Weekly.Bulletin@cern.ch
Tel. 73830
News from clubs?
Articles about CERN clubs in the Staff Association part of the Bulletin must be sent before Tuesday 12.00 to:
Staff.Bulletin@cern.ch
Tel. 72819

Texts (Word format) and pictures (pict, tiff, jpeg et eps) must be in a separate file.
Photos furnished by the clubs to illustrate their articles are welcome.

If you have questions or comments about the Web site contact: Bulletin-Support@listbox.cern.ch

Dominik Stankowski · dominik@stankowski.ch 14


University of Applied Sciences of Fribourg · European Organization of Nuclear Research
Diploma Project “Bulletin 2 PDA”
EIA-FR MISL CERN

Figure 2.6 Cover Article of the Paper Version of Bulletin #46/2002

Dominik Stankowski · dominik@stankowski.ch 15


University of Applied Sciences of Fribourg · European Organization of Nuclear Research
Diploma Project “Bulletin 2 PDA”
EIA-FR MISL CERN

Analysis of the current System


Content
In this chapter we will have a closer look at the different modules of the current system and analyze
how PDA output can be provided in the given context. Further on, actual problems will be pointed out
for later processing in the project specifications.

2.1.3 Creation and Publication Processes


The production of an eBulletin issue implies two main processes: The creation and the publication of
the Bulletin. We will analyze both and specify the related components.

Creation process
In the creation process of the eBulletin, data is collected from different databases, useful data is
extracted and then merged to complete information. The result of this merge is then saved or provided
for on-the-fly processing.

Query article Extract useful


database data

Query seminar Extract useful Merge Store extracted


database data information data

Query photo Extract useful Use extracted


database data data on-the-fly

Figure 2.7 Creation of an eBulletin issue (Gathering and Merging information)

Publication process
In the publication process, the before stored data is processed to the final eBulletin output. Data
previously stored on a device gets read and processed, probably personalized, and finally formatted
depending on the connected device.

Dominik Stankowski · dominik@stankowski.ch 16


University of Applied Sciences of Fribourg · European Organization of Nuclear Research
Diploma Project “Bulletin 2 PDA”
EIA-FR MISL CERN

Get extracted Apply layout- Output on


data on-the-fly formatting device 1

Load extracted Personalize Apply layout- Output on


data data formatting device 2

Apply layout- Output on


formatting device 3

Figure 2.8 Publication of an eBulletin issue (Personalization and Formatting)

How are these steps respected in the current eBulletin system?


The current eBulletin system does neither separate the above mentioned different steps, nor does it
take in consideration a clear distinction of two processes. Not the information itself, once collected
and extracted from the databases, is stored, but the final eBulletin formatting. Such a workflow has
some important disadvantages which are:
Limited extensibility because of non-modular processing: The different steps are not well
separated, but sometimes merged together – future changes are restricted.
Reuse of extracted data is not possible because of storage together with formatting: Output to
another device is not possible from the stored data, because the formatting is not
compatible with some mobile devices. This disadvantage causes a complete new design
of the eBulletin system.

2.1.4 Output portability


The current eBulletin system provides output to the Web. Anybody can access on Internet the Website
of the CERN eBulletin at the address: http://bulletin.cern.ch. But what about information access from
other devices?

Mobile/PDA Mobile/PDA

?
eBulletin WEB eBulletin WEB
System System

?
Printing/PDF Printing/PDF
Application

Figure 2.9 Current eBulletin output formats Figure 2.10 Current eBulletin output formats

With the current system, output to different devices is not possible and interaction with other systems
cannot be provided. Because of both, non-modularity and hardcoded output, the new system is limited
to the actual output format.

Dominik Stankowski · dominik@stankowski.ch 17


University of Applied Sciences of Fribourg · European Organization of Nuclear Research
Diploma Project “Bulletin 2 PDA”
EIA-FR MISL CERN

Project Specification
Content
This chapter provides an overview of the project specification process during the first weeks of the
project. The out of this process developed project, requirement specifications finalize the analysis part
of the project.

2.1.5 Objectives
The main objective of the project is to develop a feasibility study for CERN Bulletin’s broadcasting to
mobile devices (PDA, mobile phones etc.) with an approach based on new technologies. This
technologies offer great possibilities, showed here below on two levels:

Technological Level
New technologies offer great advantages:
Reusability of application code
Enhancement of application development cycle
Simplification of product support
Extensibility for future mobile device support

Conceptual Level
Several advanced functions can be realized thanks these new technologies:
Workflow improvement based on the actual two sources (ALEPH and PHOTO databases)
 Minimization of data duplication
 Usage of a structured data exchange format (XML) which standardizes different data
source inputs and application output
Access to information without locality or domain constraints
Synchronization of personal data with existing authentication system (Windows, Unix) using
LDAP (Lightweight Directory Access Protocol), currently there are no user profiles
Improvement of information broadcasting for mobile devices (PDA, mobile phones)

Dominik Stankowski · dominik@stankowski.ch 18


University of Applied Sciences of Fribourg · European Organization of Nuclear Research
Diploma Project “Bulletin 2 PDA”
EIA-FR MISL CERN

2.1.6 Vision of the New Structure


As illustrated in the following scheme:

MySQ Params PDA


L
Article XSL/PDA
s Cell
XSL/CP Phone
MySQL eBulletin XML
XML
Seminars XSL/M Mobil
e
XSL/V PC
Photo Voice
Future
dev.

Figure 2.11 Vision of a new eBulletin system

Dominik Stankowski · dominik@stankowski.ch 19


University of Applied Sciences of Fribourg · European Organization of Nuclear Research
Diploma Project “Bulletin 2 PDA”
EIA-FR MISL CERN

Chapter 3

Technologies

3.1 Personal Digital Assistant


Content
You find in this chapter general aspects of mobile computing such as differences of devices and
operating systems used.

3.1.1 What is a PDA?


A personal digital assistant (PDA) is a hand-held computer that allows you to store, access, and
organize information. Most PDAs work on either a Windows-based or a Palm operating system.
PDAs can be screen-based or keyboard-based, or both.

Basic PDAs allow you to store and retrieve addresses and phone numbers, maintain a calendar, and
create to-do lists and notes. More sophisticated PDAs can run word processing, spreadsheet, money
manager, games and electronic book reading programs and also provide email and Internet access.
Some PDAs come with all of the programs included. For others, you have to acquire or purchase extra
software to run these programs. Some PDAs play stereo quality music and record voice memos, while
some others do so with additional hardware.

Most PDAs can exchange information with a desktop or laptop computer, although you may have to
buy additional accessories.

3.1.2 Market overview


The history of handheld devices is short but successful, yielding unit sales of over $436 million in
1999. Since the two leading vendors in the industry, Palm and Microsoft, launched their handheld
operating systems (OS) in 1996, there have been three revisions of Microsoft, and Palm has
assembled a device line of four major product series. The fundamental difference between Palm and
Microsoft is that Palm licenses its OS and carries a line of handhelds, while Microsoft's PocketPC
provides an operating environment for hardware manufactured by other companies such as Compaq,
NEC, HP, and Casio.

Dominik Stankowski · dominik@stankowski.ch 20


University of Applied Sciences of Fribourg · European Organization of Nuclear Research
Diploma Project “Bulletin 2 PDA”
EIA-FR MISL CERN

Palm with dominating market share numbers.


PalmOS-based devices, namely Palm's own systems, have historically dominated market share for
years. According to NPD Intelect, in August 2000, Palm devices held 70.3%, Handspring 15.5%,
Compaq 6.6%, Hewlett Packard 1.7%, and Casio 1.4%. Combined, handhelds based on Palm OS
carry over 85% market share, while PocketPC devices are a distant second with approximately 10%
market share. However, PocketPC-based devices are making a move. With Compaq's new iPaq
product, introduced in July 2000, Compaq has sprung from negligible market share into third place
behind Palm and Handspring. Also, Microsoft's recent initiatives may propel a larger user base
towards PocketPC.

Wireless connectivity is a near-term priority.


The near-term goals for PalmOS, PocketPC and their competitors are based on the need to capture and
maintain market share. Importantly, these companies are focused on snaring corporate users with their
new wireless efforts. Even though Palm is currently ahead with its integrated wireless offerings,
Microsoft has recently announced a large speech and mobile access infrastructure initiative. Besides
developing infrastructure, other wireless priorities include partnering with mobile service providers,
modem, and other hardware vendors.

Dominik Stankowski · dominik@stankowski.ch 21


University of Applied Sciences of Fribourg · European Organization of Nuclear Research
Diploma Project “Bulletin 2 PDA”
EIA-FR MISL CERN

Connect PDAs to the WWW


Content
The following paragraphs describe how a PDA can access the Internet.

3.1.3 Introduction
Currently, the ability to access the content of the Internet through a PDA is more limited than through
a desktop computer. Some Internet features that are available to most desktop computer users may not
be available to PDA users. For example, PDAs may not allow users to play certain games, use certain
audio or video features, or view information in certain formats like PDF (Portable Document Format)
files. In addition, there is limited support for multimedia programs available on some Web sites.

Many PDAs allow access to e-mail accounts, but some PDAs limit the ability to send, receive, or
view e-mail attachments. Not all devices are able to display attachments in popular formats like MS
Word, Adobe PDF, and HTML without additional software.

3.1.4 Connection Possibilities


There are two different ways of accessing online information for PDA devices:
 Online using an HTML browser
 Online using a WAP browser
 Offline by synchronization
 Via a midlet that connects directly to the application

Online using an HTML browser


There are several browsers for PDAs available on the market; some PDAs come already with an
HTML browser installed. There are two different types of HTML browsers:
Standalone browsers (e.g. Internet Explorer for PocketPC): They work like a normal web
browser, however, some functions are limited (javascript, fonts etc.) because of the
PDA’s operating system, smaller memory or screen size. Web sites have to be
programmed in this limited HTML syntax.

Dominik Stankowski · dominik@stankowski.ch 22


University of Applied Sciences of Fribourg · European Organization of Nuclear Research
Diploma Project “Bulletin 2 PDA”
EIA-FR MISL CERN

Browsers connected to a service (e.g. Blazer etc.): They are connected to a service which first
parses a called web site, cuts out unreadable syntax and then sends the site either in
limited HTML or in a special format to the application on the PDA.

These browsers are very useful for the connection to the eBulletin, so a „limited HTML “ output
should be provided.

Online using a WAP browser


The WAP protocol is the leading standard for information services on wireless terminals like digital
mobile phones. The WAP standard is based on Internet standards (HTML, XML and TCP/IP). It
consists of a WML language specification, a WMLScript specification, and a Wireless Telephony
Application Interface (WTAI) specification. To fit into a small wireless terminal, WAP uses a Micro
Browser. A Micro Browser is a small piece of software that makes minimal demands on hardware,
memory and CPU. It can display information written in a restricted mark-up language called WML.
The Micro Browser can also interpret a reduced version of JavaScript called WMLScript.

Offline by synchronization
Actually there are a lot of synchronization services available for PDA users, a well known is the
“AvantGo” service. A user subscribes to so called channels which provide news content for every
type of interest. Synchronization software has to be installed on a PC to which the PDA connects over
a serial or wireless connection. Every time the user syncs his PDA, the PC connects to the
synchronization service which itself connects to the channel content deliverer. Content is verified by
the service provider and useless tags are cut off to keep the amount of data small. This data is sent to
the client software on the PC which delivers the PDA with the updates. This method has the
advantage to deliver data very quickly because the synchronization is made via a fast connection and
stored in the memory.

Connection
PDA to AvantGo
Sync
PDA Client PC AvantGo Server eBulletin Server

Channel
Update

PDA Client PC AvantGo Server eBulletin Server

Data
PDA Update Delivery

PDA Client PC AvantGo Server eBulletin Server

Figure 3.12 A synchronization cycle of the AvantGo system

Dominik Stankowski · dominik@stankowski.ch 23


University of Applied Sciences of Fribourg · European Organization of Nuclear Research
Diploma Project “Bulletin 2 PDA”
EIA-FR MISL CERN

Data exchange and transformation


Content
This chapter gives an overview of data exchange and transformation standards in the context of XML-
based languages. The new eBulletin system will be based on XML as underlying data exchange and
transformation standard which simplifies both workflow and handling while improving data exchange
and portability.

3.1.5 eXtensible Markup Language (XML)


Extensible Markup Language (XML) is a human-readable, machine-understandable, general syntax
for describing hierarchical data, applicable to a wide range of applications (databases, e-commerce,
Java, web development, searching, etc.). Custom tags enable the definition, transmission, validation,
and interpretation of data between applications and between organizations.

The XML specification defines a standard way of representing structured data. The data is represented
(mostly) as elements and attributes.

<Weather>
<City>
<Name>London</Name>
<Temperature Units=
"Fahrenheit">72</Temperature>
<Temperature Units=
"Centigrade">25</Temperature>
</City>
<City>
<Name>LA</Name>
<Temperature Units=
"Centigrade">25</Temperature>
</City>
</Weather>

Figure 3.13 A Sample XML-File

Dominik Stankowski · dominik@stankowski.ch 24


University of Applied Sciences of Fribourg · European Organization of Nuclear Research
Diploma Project “Bulletin 2 PDA”
EIA-FR MISL CERN

Figure 3.14 A Tree-representation of the Sample XML-File


The previous XML code can be represented as a tree, and indeed the W3C has a specification that lets
us do just that – the DOM or Document Object Model in the figure above is a representation of that
tree.

3.1.6 Document Type Definition (DTD)


The purpose of a DTD is to define the legal building blocks of an XML document. It defines the
document structure with a list of legal elements. A DTD can be declared inline in your XML
document, or as an external reference.

Why use a DTD?


XML provides an application independent way of sharing data. With a DTD, independent groups of
people can agree to use a common DTD for interchanging data. Your application can use a standard
DTD to verify that data that you receive from the outside world is valid. You can also use a DTD to
verify your own data.

Dominik Stankowski · dominik@stankowski.ch 25


University of Applied Sciences of Fribourg · European Organization of Nuclear Research
Diploma Project “Bulletin 2 PDA”
EIA-FR MISL CERN

Information System Technologies


Content
In this chapter I will mention the technologies used for the new eBulletin system. These are the web
server, the Servlet container, XML/XSL parser, and transformation classes.

3.1.7 Apache Webserver


Apache has been the most popular web server on the Internet since April of 1996. The October 2002
Netcraft Web Server Survey (http://www.netcraft.com/survey) found that 61% of the web sites on the
Internet are using Apache, thus making it more widely used than all other web servers combined.

The Apache HTTP Server Project is an effort to develop and maintain an open-source HTTP server
for various modern desktop and server operating systems, such as UNIX and Windows NT. The goal
of this project is to provide a secure, efficient and extensible server which provides HTTP services in
sync with the current HTTP standards.

Figure 3.15 Netcraft Web Server Survey (October 2002)

For more information see Error: Reference source not found.

Dominik Stankowski · dominik@stankowski.ch 26


University of Applied Sciences of Fribourg · European Organization of Nuclear Research
Diploma Project “Bulletin 2 PDA”
EIA-FR MISL CERN

3.1.8 Tomcat J2EE Container


Tomcat is the Servlet container that is used in the official Reference Implementation for the Java
Servlet and JavaServer Pages technologies. The Java Servlet and JavaServer Pages specifications are
developed by Sun under the Java Community Process.

Tomcat is developed in an open and participatory environment and released under the Apache
Software License. Tomcat is intended to be a collaboration of the best-of-breed developers from
around the world.

For more information see Error: Reference source not found.

Dominik Stankowski · dominik@stankowski.ch 27


University of Applied Sciences of Fribourg · European Organization of Nuclear Research
Diploma Project “Bulletin 2 PDA”
EIA-FR MISL CERN

Chapter 4

Design

4.1 System Architecture


Content
This chapter gives an overview of the system architecture and its components.

4.1.1 Global View


Looking at the project specifications and the given environment several considerations have to be
made what I will explain shortly in the next paragraphs. More detailed information about all modules
can be found in the following chapters.

Transformer
Data is gathered from different databases, normally providing XML as output. The Seminars-
database, however, does not provide XML output, so a transformation module has to be applied
between the database’s output and the input to the connector.

Connector
The goal of the connector is to merge XML output of the databases to a single XML document (XML
input). This procedure implies an XML-scheme transformation from the scheme of the databases to
the scheme of the eBulletin system which is specified in the eBulletin-DTD. The built XML-file
contains all data available for one eBulletin issue, including fulltexts and titles in different languages.

Filter
The filter transforms the XML Input-File into a personalized XML-File according to given
parameters. These parameters concern language, output device information, output format, client
location etc.

Dominik Stankowski · dominik@stankowski.ch 28


University of Applied Sciences of Fribourg · European Organization of Nuclear Research
Diploma Project “Bulletin 2 PDA”
EIA-FR MISL CERN

Formatter
The formatter transforms the personalized XML-File into the requested output format respecting the
output capabilities of the device. Depending on the requested language, the according dictionary,
which contains label and message information, is joined.

The figure below gives a global view of the entire eBulletin system:

Databases Transformer Connector XML (1) Filter XML (2) Formatter Output

XML
Params Dictionary
HTML
Articles

Rel. XML

WAP

Seminars

XML
Saveable PDF/PS
XML
Photo
Metadata
XML

Figure 4.16 Global View of the new eBulletin System


The following chapters will specify both design and context of each of the transformation modules
listed above.

4.1.2 eBulletin-DTD for underlying XML-Model


The use of XML as underlying data format demands a standardization of the XML-model in order to
guarantee data portability between the different modules inside the system. Therefore, a DTD must be
defined which exactly specifies this model reflecting as precisely as possible the structure of the
system related information.

Looking at the current eBulletin a first approach has to be made which divides the information in
different categories:
News Articles
Official News
Pension Fund
Seminars
General Information
Staff Association
Social and Culture Events
Extras

Then these categories become refined. In fact, when considering these categories it shows that all of
them contain articles, with the exception of the category seminars which contains seminar
information. An article can be split into the following fields:
Title
Summary

Dominik Stankowski · dominik@stankowski.ch 29


University of Applied Sciences of Fribourg · European Organization of Nuclear Research
Diploma Project “Bulletin 2 PDA”
EIA-FR MISL CERN

Subject
Fulltext
Person (Author)
Issue number
Article order
Links
etc.

Seminar information is specified through a few common and some other fields:
Title
Summary
Subject
Person (Speaker)
Day
Hour
Location
Links
etc.

From this splitting into different fields, three approaches for an XML-model (DTD) were worked out.
It is important to mention that the construction of these approaches was done with the emphasis to
reflect the structure of the eBulletin and not to take into consideration the structure of data received
from the databases. The following reasons lead to such a conviction:
An adapted XML-model keeps processing as simple as possible in all internal modules.
Using an own XML-model guarantees a greater database independency than reusing the
model of the database.

Dominik Stankowski · dominik@stankowski.ch 30


University of Applied Sciences of Fribourg · European Organization of Nuclear Research
Diploma Project “Bulletin 2 PDA”
EIA-FR MISL CERN

Transformer
Content
This chapter defines the rules for the transformer-module which transforms seminar information (non-
XML-data) into a specified XML-model for further use in the Input-XML file.

4.1.3 Introduction
As XML is used as data-exchange format inside the application and also for the communication with
other systems, non XML-data must be converted into XML. As the seminars-database does not
provide an XML-output, it’s mainly its output which must be converted into XML. Therefore IBM
Alphaworks provides a tool called XML Access Service Lightweight Extractor (XLE) which matches
our needs best: it transforms a result-set of a relational database into XML. See chapter Error:
Reference source not found for more information about the product.

4.1.4 The XLE Transformer


The XLE Transformer uses an extended DTD, called DTDSA, to define the XML-model into which
the input should be transformed. The extension is subject to mapping information that defines which
field of the ResultSet matches into which element of the XML-file.

Let us have a look at an example. You can use a simple DTD to create a new DTDSA. For example
let’s suppose that you use the following DTD:
<!DOCTYPE ListOfSeminars [
<!ELEMENT ListOfSeminars (seminar)* >
<!ELEMENT seminar (#PCDATA) >
]>

Figure 4.17 A sample DTD-file


For example, imagine you want to map output from the following database table:

Figure 4.18 A sample XML output using the DTDSA-file in Error: Reference source not found
As you can see the transformation of non-XML-data into XML becomes very simple using this
module. For further questions about functions and declarations I refer to the documentation that
comes with the XLE package.

Dominik Stankowski · dominik@stankowski.ch 31


University of Applied Sciences of Fribourg · European Organization of Nuclear Research
Diploma Project “Bulletin 2 PDA”
EIA-FR MISL CERN

Please note, that this kind of transformation is not only very simple, but is much faster adaptable to
database changes, too, because it works on a declaration level.

For the eBulletin system I used a DTDSA which matches exactly the format of a report node-set of
the eBulletin-DTD in order to be able to copy the seminars part directly into the XML input-file
(avoiding further XML-transformation). The DTDSA used is the following:
<!DOCTYPE Seminars [
<!ELEMENT Seminars (Report* :: r := SQL("SELECT DISTINCT * FROM TALK left join
AGENDA on TALK.ida=AGENDA.id left join LEVEL as LEVEL1 on AGENDA.fid=LEVEL1.uid
left join LEVEL as LEVEL2 on LEVEL1.fid=LEVEL2.uid where (TO_DAYS(TALK.tday)
BETWEEN TO_DAYS('$in0') and TO_DAYS('$in1')) and (AGENDA.fid = '2l20' or
LEVEL1.fid = '2l20' or LEVEL2.fid = '2l20') order by TALK.tday, TALK.stime")) >

<!ELEMENT Report (Identification, Content, Publication)>


<!-- Seminar: ida + ids + idt, in future: Controllfield tag=001 -->
<!ATTLIST Report RecordId CDATA #REQUIRED : concat(r.ida, r.ids, r.idt) >
<!-- ###################################################################### -->
<!ELEMENT Identification (OldSystemNumber)>
<!-- Seminar: ida + ids + idt, in future: 909 C0s -->
<!ELEMENT OldSystemNumber (#PCDATA : concat(r.ida, r.ids, r.idt)) >
]>

Figure 4.19 DTDSA-File for the eBulletin System


There is a self-created function called “replace(field, str_to_replace, str_new) ” which replaces a string
in a field with another, used to make html tags “xml-compatible ”. For more documentation on how
this function was implemented and added to the package, please consult the implementation part of
this report.

Dominik Stankowski · dominik@stankowski.ch 32


University of Applied Sciences of Fribourg · European Organization of Nuclear Research
Diploma Project “Bulletin 2 PDA”
EIA-FR MISL CERN

Connector
Content
The following chapters describe the design of the data input-module which connects the databases to
the new eBulletin system.

4.1.5 Data Exchange between Systems


Data for the eBulletin comes from different sources – from a MySQL-database (articles), from
another MySQL-database for the seminars and from a dictionary file (XML-file). How can this data
be standardized in order to feed the input of the system? What is a useful base for the different data
formats?

The answer is XML. Designed for data exchange in a machine- and human readable structured
format, well standardized and supported, XML offers great advantages. Once transformed to XML the
raw data becomes organized and can be processed to various other formats.

The Input XML-File serves to provide raw eBulletin data in a standardized and structured data format.
This approach allows simple accessing, modifying and formatting. It is constructed out of the result of
database requests (articles and seminars), and dictionary files.

This XML-File reflects quasi the total information contained in the actual issue of the eBulletin. It can
be pre-processed for faster reply and stored intermediately or for longer terms in order to save a
complete copy of the eBulletin in a well exchangeable format.

Dominik Stankowski · dominik@stankowski.ch 33


University of Applied Sciences of Fribourg · European Organization of Nuclear Research
Diploma Project “Bulletin 2 PDA”
EIA-FR MISL CERN

Filter
Content
This chapter provides design information about the filter module of the eBulletin system.

4.1.6 Why Content-filtering?


Considering the global view of the application, we can remark that filtering comes just before the
output formatting-module. Now what is the use of this filter?

The system provides output formatting for different devices. For each output, a new output-formatter
is needed. Some parameters are common for each of these formatters, e.g. if the user selects French as
output language, each output will be in the French version of the eBulletin. In order to avoid
implementation of parameter processing in each single formatter, general parameters such as
language, access restrictions etc. are already applied in the filtering-module. The filter takes out
unusable or forbidden content of the XML-file, so there is no need to care about in the formatter-
module.

The following parameters are subject to content-filtering:


Language: Unused language can be stripped off which considerably reduces the size of the
XML-file.
Access restrictions: If access restrictions have to be applied, the filter takes off forbidden
content.
User specific content: User rules which demand personalized content filter out unused
categories and build a personalized XML-file.
External links: Links outside the eBulletin server can be filtered out for systems that provide
offline-content (e.g. AvantGo).

4.1.7 The Filtering-process


In this chapter I will describe the filtering procedure for each parameter.

Dominik Stankowski · dominik@stankowski.ch 34


University of Applied Sciences of Fribourg · European Organization of Nuclear Research
Diploma Project “Bulletin 2 PDA”
EIA-FR MISL CERN

Language
Some elements in the eBulletin XML-model are provided in both languages English and French. In
order to unify and simplify processing of such elements in the formatters, unused language fields are
filtered out.

The following diagram shows the detailed procedure:

Language XML
Parameter

Filter XML-file

no
elt=t?s?f?
yes

yes
attr=lang?
no

Take the element


off

Figure 4.20 Language filter


The filter searches the XML-file for the elements „title “, “summary ”, and “fulltext ” checking the
attribute “lang”. If its value does not match the language parameter, the element is taken away.

Access restrictions
Persons out of CERN are not allowed to see the category “Official News ”. The current eBulletin
system checks the IP address of the connecting machine and if it is not in the range of the network of
CERN it takes out the “Official News”. However, this technique is not the best solution, because
people working at CERN but connecting from a PC from outside of CERN cannot read the “Official
News”.

Beside of simply checking the IP-Address, user profiles could be used, which would not only allow
showing hidden content to registered users but would also allow personalized content. Unfortunately,
the problem at CERN is that there is no “common login ”, meaning a server which can store user
profiles for different applications and control access restriction. Therefore, new profiles must be
stored in the application itself and a new login/password pair must be created for each user – a
complicated and not very user-friendly way. There is a running project at CERN which is dedicated to
provide “common login” services, so in the future, profiles could be used to personalize content.
The different methods are listed here below:
Normal Web access: IP-address checking, in the future profiles could be used
Mobile devices: IP-address checking and inscription into cookies (for browsers which provide
cookies)
AvantGo service: Right checking at subscription level and storage together with the
subscription data

Dominik Stankowski · dominik@stankowski.ch 35


University of Applied Sciences of Fribourg · European Organization of Nuclear Research
Diploma Project “Bulletin 2 PDA”
EIA-FR MISL CERN

For more information about this topic, please consult the implementation part of this report.

Personalized content
Everybody using a mobile device has the possibility to personalize the eBulletin to match best
individual interests. For the moment, images and specific categories can be taken off, which is very
useful especially for users with limited memory capacity on PDAs. Moreover, personalization let hide
non-relevant information and so helps saving time. (People like personalization, if you do not believe
it, have a look at the different desktop backgrounds, mobile phone covers etc.)

Let us have a look at the following diagram:

Personalization XML
Parameters

Filter XML-file

no no
hide categ? hide img?
yes yes

yes Take the node-set Take the image off


off

Figure 4.21 Filter for personal content


The filter parses the XML document checking different elements like “categories ” and “images ”.
According to submitted parameters, these elements are copied into the resulting XML document or
cut off to be hidden to the connecting user.

External Links
External links are often provided in eBulletin articles and seminar information. Normally this is very
useful; users can get more information about a topic following such a link. However, when
synchronizing eBulletin content to a mobile device for offline reading, external links could influent
the synchronization process in a negative manner: huge external pages can lead to delayed
synchronization or even to buffer overflow in the mobile device, so that eBulletin content could not be
saved properly. In order to avoid such behaviour, articles and seminar information are parsed for
external links and cut off when providing information for a synchronization service.

Dominik Stankowski · dominik@stankowski.ch 36


University of Applied Sciences of Fribourg · European Organization of Nuclear Research
Diploma Project “Bulletin 2 PDA”
EIA-FR MISL CERN

Formatter
Content
In this chapter I will give an overview of the formatting process and the provided output formats.

4.1.8 Introduction
In the current eBulletin system, data is not stored separately but only within a HTML formatting
context. As a matter of fact, it can only be provided in this format and not be easily converted into
another format, e.g. a format for mobile devices. With the usage of XML as data model and XSL as
formatting language, the new eBulletin system can now provide not only HTML output but much
more different formats, such as PDA-friendly HTML, WML, PDF etc. The creation of such output
formats is independent of the data gathering and transformation processes and happens in the last
module of the whole system. The advantage is that future extension is easily possible and even the
realization of an interface to another application is now possible.

4.1.9 The Formatting Process


The formatting process is visualized in the figure below:

Formatter XSLT

Dictionary
XML
Output

Filtered XML

Formatter

Figure 4.22 Formatting process


The input of the formatter is fed with the filtered XML file just before it is created in the filter
module. This XML-file is joined in the formatter to the appropriate dictionary-XML-file according to

Dominik Stankowski · dominik@stankowski.ch 37


University of Applied Sciences of Fribourg · European Organization of Nuclear Research
Diploma Project “Bulletin 2 PDA”
EIA-FR MISL CERN

the language parameter set. The output is then created using the respective XSLT-file to the
demanded format.

For each different output format, another XSLT-file is used. This file contains information about the
output format properties and on how XML content can be brought into this format. This kind of
policy is read and put into action by the formatter.

Dominik Stankowski · dominik@stankowski.ch 38


University of Applied Sciences of Fribourg · European Organization of Nuclear Research
Diploma Project “Bulletin 2 PDA”
EIA-FR MISL CERN

Chapter 5

Advanced Design

5.1 XSL Pipelining


Content
This chapter provides an overview of the advantages of XSL Pipelining and how it is applied to the
new eBulletin system.

5.1.1 Introduction
In the eBulletin application, several modules must be chained in order to get the requested output.
Xalan provides functionality called pipelining which is more powerful as simple chaining of XSL
transformations because it provides streamed in- and output. This means that while output of a first
transformation is still generated, a second transformation can already start.

5.1.2 Pipelining for the eBulletin


Pipelining plays a key role in the eBulletin system, because several modules are chained together.
Especially between the filter and formatter module such a pipelining occurs. Have a look at the figure
below:

Dominik Stankowski · dominik@stankowski.ch 39


University of Applied Sciences of Fribourg · European Organization of Nuclear Research
Diploma Project “Bulletin 2 PDA”
EIA-FR MISL CERN

Filter XSLT Formatter XSLT Bulletin XML Dict. XML

SAX Reader Serializer


Parsed XSL
Filter Formatter
SAX Events

Streamed Output

Figure 5.23 Pipelining of XSLT formatting in the eBulletin application


A SAX-Reader parses the different stylesheets, business rules are applied with appropriate
parameters. The serializer serializes extracted SAX events into an appropriate order. These events are
then processed together with the XML input, a Bulletin XML-file and a Dictionary
.

Dominik Stankowski · dominik@stankowski.ch 40


University of Applied Sciences of Fribourg · European Organization of Nuclear Research
Diploma Project “Bulletin 2 PDA”
EIA-FR MISL CERN

Stylesheet Compilation
Content
This chapter provides information about what stylesheet compilation is and how it can be used to
accelerate the response time of the eBulletin system.

5.1.3 Introduction
XSLT is a programming language, expressed using XML syntax. This is not for the benefit of the
computer, but rather for human interpretation. Before the stylesheet can be processed, it must be
converted into some internal machine-readable format. This process should sound familiar, because it
is the same process used for every high-level programming language. The programmer works in terms
of the high-level language, and an interpreter or compiler converts this language into some machine
format that can be executed by the computer.

A better approach is to parse the XSLT stylesheet into memory once, compile it to machine-format,
and then preserve that machine representation in memory for repeated use. This is called stylesheet
compilation and is not different in concept than the compilation of any programming language.

5.1.4 Templates API


Different XSLT processors implement stylesheet compilation differently, so JAXP includes the
javax.xml.transform.Templates interface to provide consistency. This is a relatively simple interface
with the following API:
public interface Templates {
java.util.Properties getOutputProperties( );
javax.xml.transform.Transformer newTransformer( )
throws TransformerConfigurationException;
}

Figure 5.24 The javax.xml.transform.Templates API


A template instance interacts perfectly with an XSLT transformer as showed in the figure below:

Dominik Stankowski · dominik@stankowski.ch 41


University of Applied Sciences of Fribourg · European Organization of Nuclear Research
Diploma Project “Bulletin 2 PDA”
EIA-FR MISL CERN

Figure 5.25 Relationship between Templates and Transformer


Thread safety is an important issue in any Java application, particularly in a web context where many
users share the same stylesheet. As Figure 5 .25 illustrates, an instance of “Templates” is thread-safe
and represents a single stylesheet. During the transformation process, however, the XSLT processor
must maintain state information and output properties specific to the current client. For this reason, a
separate “Transformer” instance must be used for each concurrent transformation.

Dominik Stankowski · dominik@stankowski.ch 42


University of Applied Sciences of Fribourg · European Organization of Nuclear Research
Diploma Project “Bulletin 2 PDA”
EIA-FR MISL CERN

Chapter 6

Implementation

6.1 Classes
Content
In the previous chapters we have covered all facilities of system design. Together these provide an
overview of the whole system definition. Let us now have a look at the implementation of the system.

6.1.1 Implemented Classes


The new eBulletin application uses three types of classes:
Servlets
Utilities
Extensions

The following lines will cover the characteristics of each of these types.

Servlets
The Servlets are the heart of the eBulletin application. They process HTTP requests incoming to the
application, transform parameters to be usable for further stylesheet processing, invoke stylesheet
processors and interact with the utility classes and external class libraries, and finally response
transformed data to the client.

The following Servlets were implemented:


input: receives data access information and issue number of a released bulletin and
implements the framework for the “transformer” and “connector ” modules.
output: receives data output information and issue number of a prepared eBulletin issue and
implements the framework for the “filter” and “formatter ” modules.
prefs: gets personalized eBulletin data for the connected user and shows this as preferences.
setPrefs: stores personalized eBulletin data for the connected user

Dominik Stankowski · dominik@stankowski.ch 43


University of Applied Sciences of Fribourg · European Organization of Nuclear Research
Diploma Project “Bulletin 2 PDA”
EIA-FR MISL CERN

Utilities

Dominik Stankowski · dominik@stankowski.ch 44


University of Applied Sciences of Fribourg · European Organization of Nuclear Research
Diploma Project “Bulletin 2 PDA”
EIA-FR MISL CERN

Modules
Content
This chapter provides an overview of the implementation of the different modules mentioned in the
design part of this report. All the modules are implemented using XML/XSL transformation
(exception Transformer module) in order to keep them easy to understand and well modifiable.

6.1.2 Transformer
The transformer module transforms a result set from a relational database into XML. Remember the
transformer uses a library called XLE to do this transformation. It is easy to understand that no XSL
could be applied, because the input data is not XML. However, the implementation could be kept on a
high abstraction level, because XLE allows process definition by means of an extended DTD, called
DTDSA (see Desgin part of this report for more information about this file).

Besides the definition of the DTDSA, the XLE transformer has to be configured for database access
using the configuration file “access.cfg”. It specifies access to the relational database (with username
and password) and the driver. It looks as following:
org.gjt.mm.mysql.Driver
jdbc:mysql://cdsdb.cern.ch/AGE?user=[username]&password=[password]

Figure 6.26 Part of the XLE configuration file "access.cfg"

The seminar information comes from the agenda database. Unfortunately, some information is given
only in HTML, so a special conversion must be made in order to keep the generated XML file valid.
Especially all HTML tags must be transformed into text which can be done by replacing all “< ” by
“&lt;”. Therefore, a new class must be implemented and added to the XLE library, because no
“replace” command is available for XLE transformation. Have a look at the implemented class:
/**
* Title: Bulletin2PDA
* Description: Diploma Project
* Copyright: Copyright (c) 2002
* Company: EIA-FR/CERN
* @author Dominik Stankowski
* @version 1.0

Dominik Stankowski · dominik@stankowski.ch 45


University of Applied Sciences of Fribourg · European Organization of Nuclear Research
Diploma Project “Bulletin 2 PDA”
EIA-FR MISL CERN

*/

import java.io.*;
import java.util.*;
import java.lang.StringBuffer;
import java.util.StringTokenizer;

public class XLEMyTransformer {

public static String replace(String str_text, String str_search, String


str_replace) {
StringTokenizer t = new StringTokenizer(str_text, str_search, false);
StringBuffer b = new StringBuffer("");
try {
if (! t.hasMoreTokens())
return "";
if (str_text.startsWith(str_search))
b.append(str_replace);
b.append(t.nextToken());
while (t.hasMoreTokens()) {
b.append(str_replace);
b.append(t.nextToken());
}
return b.toString();
} catch (Throwable e) {
System.out.println("replace: " + e);
}
return "Exception!!!";
}

Figure 6.27 Extension for the XLE library (replace characters during the transformation process)
This file was added to the XLE library which was renamed to “XLE_ds.jar ”.

6.1.3 Connector
The connector module joins the provided XML data from different sources and transforms it into the
XML model of the eBulletin application using an XML/XSL transformation. There are two different
data sources, article information in MARC-21 XML format and converted seminar data in XML
format.
Article information has to be extracted from the MARC-21 format and put into the new model.
Therefore the “xsl:key” functions is used:
<xsl:key name="ArticleSet" match="marc:collection/marc:record"
use="translate(marc:datafield[@tag='650']/marc:subfield[@code='a'],
'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz')" />

Figure 6.28 Usage of "xsl:key" function – key definition


Data can then be matched into the corresponding elements:
<xsl:for-each select="key('ArticleSet', 'news articles')">
<xsl:sort select="marc:datafield[@tag='909' and @ind1='C' and
@ind2='4']/marc:subfield[@code='c']/." />
  <xsl:call-template name="report" />
</xsl:for-each>
Figure 6.29 Usage of "xsl:key" function – set retrieval
This is done using an appropriate template which defines data mapping between the elements of the
two XML-models (see DTD in the design part of this report, in order to get to know the exact data
matching for each element).

Dominik Stankowski · dominik@stankowski.ch 46


University of Applied Sciences of Fribourg · European Organization of Nuclear Research
Diploma Project “Bulletin 2 PDA”
EIA-FR MISL CERN

Seminar data is already brought into the right XML model during transformation from relational data
to XML. This technique allows minimizing XSL transformations in the connector module. Therefore
a simple “xsl:copy” command can be used:
<xsl:copy-of select="/" />

Figure 6.30 The “xsl:copy-of” function


The so generated XML-file contains all necessary bulletin data for one issue and can be saved in the
“xml/bulletin” directory.

Dominik Stankowski · dominik@stankowski.ch 47


University of Applied Sciences of Fribourg · European Organization of Nuclear Research
Diploma Project “Bulletin 2 PDA”
EIA-FR MISL CERN

Global Variables
Content
This chapter gives an overview on how path information and global variables are stored in the
“Bulletin 2 PDA” application.

6.1.4 Path Information


Path information and access to databases are stored in the main configuration file “web.xml ”. It is
read when the Servlet container starts and all parameter elements can be retrieved in a Servlet using
the command “ServletContext.getParameter(String key)”.

The following paragraphs show some examples of path information stored in the “web.xml ” file.

Path to file with default values


A simple path information for the “default.xml” file which contains default values for different
parameters such as language, issue etc.

Path to formatter XSL


The final formatting XSL is chosen automatically using the parameters {DEVDE and {LEVEL},
which are related to keywords in the filename. These keywords can be applied anywhere in the
filename, they could even specify a sub-directory.

SL in the configuration file "web.xml"

Search query for CDS Search (Beta)


The search query is built on address and place holders, this means that {WEEK} and {YEAR} can be
put somewhere in the query and will be dynamically associated with the values sent to the Servlet.
This guarantees a higher independence of the search systemFigure 6.31 Parameter
QUERY_ART_XML in the configuration file "web.xml"

Dominik Stankowski · dominik@stankowski.ch 48


University of Applied Sciences of Fribourg · European Organization of Nuclear Research
Diploma Project “Bulletin 2 PDA”
EIA-FR MISL CERN

These examples showed how the different parameters are stored in the “web.xml ” file

6.1.5 Global Variables


Global variables are stored in the “web.xml ” configuration file, too. The following figure provides
some examples:
<context-param>
<param-name>SEM_START_WEEKDAY</param-name>
<param-value>0</param-value>
</context-param>
<context-param>
<param-name>SEM_START_WEEK_ADJUST</param-name>
<param-value>0</param-value>
</context-param>
<context-param>
<param-name>SEM_DAY_RANGE</param-name>
<param-value>13</param-value>
</context-param>

Figure 6.32 Global variables in the configuration file "web.xml"


In the above example, seminar retrieval information is specified. In the CERN eBulletin seminar
information is provided for the next two weeks of an issue. It normally starts on Monday after an
issue was published and ends on Sunday two weeks later. However, these values could once be
subject to other user needs, so they will be easily changeable, because they are specified as parameters
in the web application configuration file.

Dominik Stankowski · dominik@stankowski.ch 49


University of Applied Sciences of Fribourg · European Organization of Nuclear Research
Diploma Project “Bulletin 2 PDA”
EIA-FR MISL CERN

Resources
Content
In this chapter I will explain the use of default values and dictionaries in the “Bulletin 2 PDA ”
application.

6.1.6 Default Values


The eBulletin system detects what device connects to the application, however, sometimes detection
is not possible, e.g. if an unknown browser is used. In order to still provide some content although
there are no parameters defined, a default value file was created. At the first time output is generated,
an instance of this file is loaded in the system. There it serves as a “life belt ” for insufficiently
specified requests.

The file is stored in XML format that is why it remains easily configurable:
<?xml version="1.0" encoding="ISO-8859-1" ?>
<Preferences>
<Default>
  <Lang>eng</Lang>
<Issue>latest</Issue>
<Out>response</Out>
<Img>show</Img>
<Int>false</Int>
<Dev>simple</Dev>
</Default>
</Preferences>
Figure 6.33 Default values are specified in the "default.xml" file

6.1.7 Published Issues


In order to know which issues are available for output on mobile devices, the file “issue.xml ” was
created. It specifies the issues created with the transformer and connector modules that are available
for output processing. In order to keep the system as flexible as possible, an XSL file (issue.xsl) was
created that specifies how the latest issue can be retrieved. Providing this information, even the value

Dominik Stankowski · dominik@stankowski.ch 50


University of Applied Sciences of Fribourg · European Organization of Nuclear Research
Diploma Project “Bulletin 2 PDA”
EIA-FR MISL CERN

“latest” can be specified for the parameter “issue ” (see example above). The respective files are
showed in the following figures:

Figure 6.34 The "issue.xml" file specifies the available issues

Dominik Stankowski · dominik@stankowski.ch 51


University of Applied Sciences of Fribourg · European Organization of Nuclear Research
Diploma Project “Bulletin 2 PDA”
EIA-FR MISL CERN

The Application
Content
This chapter gives some example outputs of the eBulletin on different devices.

6.1.8 Output in HTML Format for PDAs


The following figures show example outputs to a Palm device:

Figure 6.35 CERN eBulletin Mainpage Figure 6.36 Category News Articles

Dominik Stankowski · dominik@stankowski.ch 52


University of Applied Sciences of Fribourg · European Organization of Nuclear Research
Diploma Project “Bulletin 2 PDA”
EIA-FR MISL CERN

Figure 6.37 A News Article (with image) Figure 6.38 A News Article (without image)

Figure 6.39 Seminar overview Figure 6.40 Personal content settings

Dominik Stankowski · dominik@stankowski.ch 53


University of Applied Sciences of Fribourg · European Organization of Nuclear Research
Diploma Project “Bulletin 2 PDA”
EIA-FR MISL CERN

6.1.9 Output in simple HTML Format


The following figures show example outputs to standard web browser:

Figure 6.41 Output to a standard web browser (category News Articles)

Figure 6.42 Output to a standard web browser (News Article)

Dominik Stankowski · dominik@stankowski.ch 54


University of Applied Sciences of Fribourg · European Organization of Nuclear Research
Diploma Project “Bulletin 2 PDA”
EIA-FR MISL CERN

Dominik Stankowski · dominik@stankowski.ch 55


University of Applied Sciences of Fribourg · European Organization of Nuclear Research
Diploma Project “Bulletin 2 PDA”
EIA-FR MISL CERN

Chapter 7

Conclusion

7.1 Achieved Results


Content
This is the first of three chapters giving an overview of the results achieved for the project “Bulletin 2
PDA”. This chapter informs about project results, encountered problems, and personal progress.

7.1.1 Project Results


The following tasks could be successfully finished:
Analysis of the current eBulletin system and of the user needs
Project definition and project requirement specification
Analysis of current mobile devices and access to data sources
Analysis of web application broadcasting to different mobile devices
Design of an underlying XML-model according to the eBulletin structure
Design of a new eBulletin system with great advantages
Design and implementation of a modular and highly extensible web application
Design and implementation of advanced programming techniques and utilities (stylesheet pre-
compilation, stylesheet cache, stylesheet pipelining etc.)
Generation of different output formats
Personalization of eBulletin content for AvantGo users and cookie-enabled browsers
Implementation of a transformer which generates XML out of a relational dataset
Implementation of a connector which gathers information from different databases
Implementation of a filter which filters elements such as external links, images, language
specific content, protected information etc.

7.1.2 Encountered Challenges


I encountered several challenges during the establishment of this project:

Dominik Stankowski · dominik@stankowski.ch 56


University of Applied Sciences of Fribourg · European Organization of Nuclear Research
Diploma Project “Bulletin 2 PDA”
EIA-FR MISL CERN

The CERN: it is neither an enterprise nor a university but a scientific organization with
international character :-)
The current eBulletin system was not reusable for broadcasting on mobile devices; so a
completely new system had to be designed.
Analysis of a current application and work out of project requirement specifications.
Data sources were not all available in XML. So a converter had to be found in order to
transform relational data into XML.
The running systems at CDS were not very well documented, so a lot of meetings had to be
held in order to get information about these systems. Already designed parts of the new
system had to be redesigned in order to adapt the system to unreported functions or to
respect environment specific definitions.
Article and seminar information were specified very restrictive, so a lot of cases were
considered during design and implementation of the system.
Sometimes article and seminar information were only provided in HTML (fulltext, summary
and title), so a lot of painful transformations had to be made in order to get data
completely without any formatting or to do modifications without affecting HTML tags.
Moreover, image links were not provided as separate information in the XML-file, they had
to be extracted from the fulltext. Furthermore, images were not always provided in an
icon format.

These are only some of the encountered challenges and it goes without saying that I learned a lot of
new things out of them which enforced heavily my personal progress.

Dominik Stankowski · dominik@stankowski.ch 57


University of Applied Sciences of Fribourg · European Organization of Nuclear Research
Diploma Project “Bulletin 2 PDA”
EIA-FR MISL CERN

eBulletin System Improvements


Content
This chapter provides an overview about the improvements made in the system design comparing the
current and the new eBulletin system.

7.1.3 System Improvements


Some improvements between the current and the new eBulletin systems are listed in the table below:

current eBulletin system new eBulletin system


Application level Independent scripts Web application
Architecture 2-tiers architecture 3-tiers architecture
Article data Article data is received from Article data is received from the
the “old” Weblib system new CDS-Search system
Storage of an eBulletin issue Processed input is stored in the Processed input is stored in a
final eBulletin format intermediate XML-file and can
(HTML/PHP) be used for other services
Modular system design Not a modular approach Modules are defined and
implemented separately
Global variables Global variables are defined in Global variables are defined in
the code the global variable file of the
application (web.xml)
Default values Default values are hard coded Default values are stored in a
in the system separate XML-file
Dictionary Dictionary is integrated in the Dictionary is a separate,
definition of global variables structured XML-file which is
remains modifiable and
reusable

Dominik Stankowski · dominik@stankowski.ch 58


University of Applied Sciences of Fribourg · European Organization of Nuclear Research
Diploma Project “Bulletin 2 PDA”
EIA-FR MISL CERN

current eBulletin system new eBulletin system


Dependency of other Highly dependent on other Higher independence of other
applications applications, namely search applications; database changes
engines which provide content would only affect the
for the system. Database model transformer or the connector
changes would affect the whole module
system
Output format Output is only possible in Output is possible in HTML,
HTML (PC/Mac) HTML for PDAs, HTML for
AvantGo, WML, PDF
Output interface No output to other systems XML used as underlying data
possible because no underlying format, so interaction with other
data format used applications would be a
possible extension
Output extensibility No other outputs can be Output interface can be
provided in the future extended to PS, RTF etc.
Data personalization Data personalization is not Data personalization is
provided provided and extensible
Device detection No device detection Detects connecting devices and
proposes appropriate output
Code implementation in PHP, not object oriented, in Java, object oriented, high
reusability and extensibility levels of reusability and
limited extensibility
Module implementation Transformations in PHP Transformations in XSL (easy
language modifiable)
Security settings Must be hard coded or on Security settings on application
server level level
Documentation concept documentations Documentation covers analysis,
design, and implementation +
links to related resources

Dominik Stankowski · dominik@stankowski.ch 59


University of Applied Sciences of Fribourg · European Organization of Nuclear Research
Diploma Project “Bulletin 2 PDA”
EIA-FR MISL CERN

A Glimpse into the Future


Content
This chapter provides additional information about possible future work on the system.

7.1.4 Work to do on the new eBulletin System


The following list points out what should be completed in order to get out the maximum of the new
eBulletin system:
XLE license: The license of the XLE module comes from IBM alphaWorks and is normally
limited to a 90 days evaluation period. Further license agreements should be planned.
Adaptation of different outputs: Namely due to HTML- coding in some fields of article and
seminar information, some outputs could not be configured completely, so these outputs
should be extended in order to create a stable release.
Due to prototype creation, no caching strategies were implemented for the AvantGo system.
However, this would be a good technique to minimize the connections from the
AvantGo server to the eBulletin system. More information about this topic can be found
at the AvantGo homepage.

These are a few points that should be respected in order to create a stable release.

7.1.5 Possible Considerations for the Future


Moreover there are some points which would be interesting to pursue in the future:
Standard HTML output: If wished, the standard HTML output could be integrated in the
design of the current eBulletin.
User feedback of PDA users must be respected to continue the improvement of the eBulletin
system.
A migration of the agenda database to CDS Search could lead to new performances and
possible on-the-fly generation of the eBulletin out of the database.
User profiles would be very interesting also for standard web users; however, this point is
highly depending on the running project at CERN which wants to provide a “common
login” for CERN applications.

Dominik Stankowski · dominik@stankowski.ch 60


University of Applied Sciences of Fribourg · European Organization of Nuclear Research
Diploma Project “Bulletin 2 PDA”
EIA-FR MISL CERN

Issues of the eBulletin could be floating; this means that content would be provided for the
eBulletin at real submission time. An artificial delay between submission time and
publication would be cancelled (today eBulletin articles are not published before the
paper version’s issue).
MIDlets could bring extended functions to the eBulletin such as transfer of seminar date and
time information into the calendar database of the PDA.

These points should be verified seriously for future development on the eBulletin system.

Dominik Stankowski · dominik@stankowski.ch 61

You might also like