Object Model of Delphi Data: DELPHI Collaboration DELPHI 98-157 PROG-235 6 October, 1998

DELPHI Collaboration DELPHI 98-157 PROG-235
6 October, 1998
Object model
of Delphi Data
N.Smirnov
F.Carena
Tz.Spasso
Abstract
The note presents the design of an Object model of Delphi data as the
rst step in the development of an Object Oriented analysis framework
for Delphi experiment. The project goals are to save the Delphi data
for a many years period providing working environment suitable for
the current generation of physicists as well as for the future one.
0
1 INTRODUCTION
Object oriented design is a key technology for the next generation of HEP software.
The creation of an object model of Delphi data is the rst indispensable step in
the development of an Object Oriented (OO) analysis framework in the Delphi
experiment.
The aims of this project are:
to provide a framework for moving toward Object Oriented Programming
(OOP) with C++ as an implementation language;
to gain software development and physics analysis experience with the new
HEP programming technologies;
to attract and maintain the interest in analysing the Delphi data far beyond
the end of the LEP data taking;
to prepare the saving and archiving the Delphi data for a long period of time
(some 30 years) over which it is assumed that today's computing environment
(existing operating systems, Fortran programming language, current CERN
Program Library etc.) would no longer remain.
In the accomplishment of the project three new for HEP technologies have been
applied { OOP, C++ programming language and Object Data Base Management
System (ODBMS) 1.
The OOP is a programming style known for about 30 years (e.g. Simula67,
Smalltalk70) and gaining recently widespread acceptance in the HEP. It is based on
two major ideas: Abstract Data Types (ADT) and and Hierarchical Types. The
concept of the ADT is that an object's type (class) should be dened by a name
and a set of proper values and operations rather than by its storage structure, which
should be hidden. Hierarchical types allow one to dene general interfaces to make
the commonality between objects explicit by using inheritance relationship.
The main advantage of the OOP is the possibility to create components which
are sets of classes united by some logical criteria, a common style or a reliance on
common services. This should allow to create the so called \plug-and-play" kind of
software increasing reuse of code and design.
C++ is a general-purpose programming language that supports data abstraction
and OOP. It is rapidly becoming the standard in the HEP applications. In the same
time the present object data model design re ects a necessity to maintain the existing
data analysis algorithms written in FORTRAN while foreseeing a migration to C++
and OOP.
1 A forth one { ORB (Object Request Broker) { is under investigation.
1
The creation of object data model requires an example of persistent data store
to be based on. The one currently used by Delphi is the data management system
ZEBRA [1] not founded on the object approach. When developing the design of an
object data model it is indispensable to take into account also its implementation
founded on the object approach, e.g. ODBMS 2 . The present model is considering
simultaneously two dierent implementation approaches { ZEBRA and ODBMS
The Objectivity/DB [2] is used as available ODBMS. Objectivity/DB based
solutions are being adopted in production by a growing number of experiments
at CERN and outside, and have been selected for a number of high-data volume
experiments starting soon (BaBar, COMPASS etc.). It is the most probable ODBMS
for archiving the LEP data.
According to the Objectivity/DB Guide \Objectivity/DB is a distributed client/
server object database management system (ODBMS) for applications that require
exible information modeling, involve complex relationships, and demand high per-
formance. Objectivity/DB provides full database functionality with a distributed
architecture that supports networks of heterogeneous platforms." [2]
2 DATA MODEL
The Delphi experimental data are subject to a set of processing stages that may
either add information to the events, or make selection based on certain criteria,
or both. The processing starts with the event reconstruction aiming to determine
physical quantities (vertices, tracks, energy clusters) from the raw data. On this
stage data for graphical event display and primary DST data are created. Further
selections and analysis based on the underlying physics data (particle identication
etc.) are performed and Post DST type data produced. They are nal physicist
data that are used for physics analysis. The object data model describes only the
result of the nal processing stage and does not take into account the intermediate
ones, nor the raw data.
Delphi events are reconstructed several times, usually with dierent congura-
tion parameters, re ecting the improvement in the understanding of the detector
performance and physics goals. Each of these processing stages generate a new ver-
sion of the output data and do not overwrite the earlier one which stays available
and may optionally be deleted later. The data versioning is not considered in the
object model and only the last and presumably the best version is included.
2If one creates an object data model, how likely is he to get it right, when his experience is
limited to only one non-object oriented data management system ?
2
It is important to distinguish between transient and persistent data models.
The persistent model consists of objects which have lifetime exceeding the lifetime
of the user programs, i.e. which can be stored on external media. The transient
model is a logical structure seen by the users in their physical applications. At
present the dierences between the transient model and the persistent model based
on Objectivity/DB are purely technical and will not be discussed in this note.
The object data model consists of three major parts :

object data part;
object interfaces;
datastore specic part.
The object interfaces are closely related to the I/O system and will be described
elsewhere later.
The object model of Delphi has been developed taking into account the following
considerations :
Delphi data information;
user access patterns;
implementation requirements of ZEBRA and Objectivity/DB.
2.1 Data Information Classication

The Delphi data are organised in collections called datasets. Each dataset contains
the events of one year of LEP data taking and corresponds to a given software
reprocessing. The number of events vary substantially from one year to another
increasing during the LEP1 (1991 { 1995) and LEP2 (1995 { 1997) energy periods.
The datasets are subdivided in runs each one covering a data taking time with
stable LEP conditions. The runs are attributed run numbers as determined from
the online system. They are spread over several les which length is dened to be
optimal for the output media and further manipulations.
Runle is the smallest event collection unit of Delphi. The information associated
with the LEP running conditions as the beam energy and position (spot), detector
quality, luminosity etc. is organised on the runle basis.
3
Year Number of Runles Number of events
1991 3 894 713 019

1992 4 302 1 706 622
1993 2 746 2 153 805
1994 5 177 4 983 339
1995 2 984 3 114 581
1995 516 215 809

1996 1 804 409 806
1997 2 459 997 349
Table 1: Number of runles and events per year.
Delphi events consist of :

Pilot information { Event Identier (run, event number) and a summary of its
topological and global physics characteristics;
Vertices and Tracks;
Detector information for the tracks;
Information related to the whole event, e.g. b-tagging, V0 decays;
Associations between dierent information elements, e.g. vertices { tracks.
Rough estimation of some important Delphi data parameters biased to upper
limits for both LEP1 and LEP2 energy periods is shown on Table 2.
s { number of datasets per LEP period 10

r { number of runles per LEP period 20000
e { number of events per runle 10000
v { number of vertices per event 10
t { number of tracks per event 100
l { number of unassociated track elements per event 100
h { number of unassociated VD hits per event 100
d { number of V0 decays per event 50
Table 2: Main Delphi data parameters.
4
2.2 User access patterns
The data access optimization is an important goal of the object data model. Bet-
ter access performance of the physics applications could be achieved by collecting
together the information that is more likely to be used together during the data
analysis. The most frequent user access patterns are established on the basis of the
experience in analysing the Delphi data.
On the dataset level the users are accessing sequentially all events of the set, or
a small subset of randomly selected events (usually about several percents). With
the increased processing homogeneity access and analysis of several datasets simul-
taneously (e.g. all LEP1 data) should be envisaged as well.
Concerning the event level the users need to access not only the event information
as a whole but dierent parts of it depending of their analysis aim :
Event Identier
Pilot information
Tracks only;
Tracks + Vertices;
Tracks + Vertices + Detector information.
2.3 Implementation requirements

When ZEBRA is used as a persistent data store only two information groups can
be considered for the event { Pilot and the rest of the event.
The Objectivity/DB as a data store allows to cluster the information from dif-
ferent parts of the event in an arbitrary way. This can be done using the Objectiv-
ity/DB storage classes :
A basic object is the fundamental storage unit, e.g. vertices, tracks etc;
A container is a collection of basic objects. They are physically clustered

together in memory, so access to all basic objects in a single container is very
ecient;
A database is a collection of containers;
A federated database contains user-dened databases and the object model
(or schema) that describes all public class denitions.
5
Each basic object is included within a container, each container is included within
a database, and each database belongs to a single federated database. The Objectiv-
ity/DB imposes some requirements and restrictions on each of these storage classes.
Each federated database (FDB) contains by denition a schema describing the
object model. The existence of many FDBs within the same data model creates
diculties to maintain the schemas consistent. Moreover the Objectivity/DB allows
only one FDB per process to be accessed. These imply the need to minimize the
number of the FDBs per model. The Delphi object model has only two FDBs
containing the LEP1 and LEP2 data respectively. Their schemas are identical for
the moment, but this partition allows to modify the model for LEP2 if necessary in
the future.
The utilisation of multiple databases (DB) and multiple containers is restricted
by the Objectivity/DB. The number of DB per FDB must not exceed 64K. The
DB is a physical le and its size is limited by the current le systems. Taking into
account these restrictions, each DB in the present object model contains the data
of one runle.
The number of containers per DB is limited to 32K. Containers are clustering
units of the Objectivity/DB and their design should re ect the user access patterns
(Section 2.2). Dierent parts of the same event are hold in dierent containers, e.g.
track container, vertex container etc.
{ Vertex container e/b

{ Vertex objects vb
{ Track Container e/b
{ Track objects tb
{ Track geometry objects tb
{ TrackDetector container e/b
{ TrackDetector Info List objects tb
{ VD Hits (associated) objects 10tb
{ Detector Container e/b
{ Detector Info List objects b
{ Unass. TE List objects b
{ Unass. TE objects lb
{ VD Hits (unass.) objects hb
{ V0 ID objects db
{ Dstar List objects b
Table 3: Estimation of number of containers per DB and objects per container.
6
The maximum number of basic objects per container depends on their size and
some FDB parameters (e.g. the page size). It can vary from the lower limit of 64K
up to 4G. In case of Delphi data the number of objects can easily exceed the lower
limit as one can see from Table 2 and some protection measures against possible
over ow should be taken. The chosen approach is to set a limit on the number of
events per container restricting this way the total number of objects in the related
containers. When the number of events exceeds the established limit next bunch of
containers is lled-up.
The estimated number of containers per DB and objects per container is shown
on Table 3 where b stands for the limit of event numbers per container and the other
designations can be seen from Table 2.
2.4 Federated Database structure

The nal design of the Federated Database structure is based on the considerations
explained above (2.1 { 2.3). A FDB consists of one main database MAIN DB and
many runle databases RUNxxx DB. Their structures are shown on Figure 1 and
Figure 2 respectively.
The MAIN DB has one dataset container DatasetCont with objects contain-
ing general information about the datasets. Runle related information and Event
Identiers (EID) of all events of this runle are stored in many corresponding runle
containers EIdRxxxCont. Each runle container has one Run Information Iden-
tier (RID) object RunInfoId, one Run Information object RunInfo with runle
related information as beam energy and position (spot), luminosity, etc., and many
Event Identier objects EIdRxxx for each of the events belonging to this runle.
A typical task in the physics applications is to locate all EIDs belonging to a
dataset known by its name. This name denes unambiguously the dataset object
Dataset in the container DatasetCont. The dataset object has many unidirec-
tional associations to RID objects. From RID objects one can navigate to the
corresponding runle container EIdRxxxCont and get all EIDs belonging to it.
Each EID has an association to the corresponding Event object which contains gen-
eral information about events, e.g. Pilot record. Event objects are hold in the event
container EventCont of the runle database RUNxxxDB (Figure 2).
Events are divided in ve parts according to the most likely user access patterns
(see 2.2). Each part is contained in a separate container { event object in Event-
Cont, vertices in VertexCont, tracks in TrackCont, detector information related
to the tracks in TrkDetectorCont, detector information related to the whole event
in EvtDetectorCont. All parts have bidirectional associations allowing to navigate
freely between them.
7
MAIN DB
DatasetCont
Dataset 1 Dataset 2
EIdRxxxCont EIdRyyyCont
RunInfoId RunInfoId
RunInfo RunInfo
EIdRxxx EIdRyyy
Figure 1: Structure of main database.
8
RUNxxx DB
EventCont
VertexCont TrackCont
EvtDetectorCont TrkDetectorCont
Figure 2: Structure of runle databases.
9
Acknowledgements
We are grateful to Delphi management for their support of this project. Special
thanks to Dirk Duellmann for the helpful discussions and consultations concerning
the Objectivity/DB.
10
REFERENCES 11
References
[1] \ZEBRA User Guide", CERN Program Library Q100/Q101, J.Zoll et al.
[2] \Using Objectivity/C++", Version 4, ObjectivityTM , Inc.

Object Model of Delphi Data: DELPHI Collaboration DELPHI 98-157 PROG-235 6 October, 1998

Uploaded by

Copyright:

Available Formats

You might also like

Object Model of Delphi Data: DELPHI Collaboration DELPHI 98-157 PROG-235 6 October, 1998

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Object Model of Delphi Data: DELPHI Collaboration DELPHI 98-157 PROG-235 6 October, 1998

Uploaded by

Copyright:

Available Formats

DELPHI Collaboration DELPHI 98-157 PROG-235

1 A forth one { ORB (Object Request Broker) { is under investigation.

The object data model consists of three major parts :

 datastore speci c part.

2.1 Data Information Classi cation

1991 3 894 713 019

1995 516 215 809

Table 1: Number of run les and events per year.

Delphi events consist of :

s { number of datasets per LEP period  10

Table 2: Main Delphi data parameters.

2.3 Implementation requirements

 A container is a collection of basic objects. They are physically clustered

{ Vertex container e/b

Table 3: Estimation of number of containers per DB and objects per container.

2.4 Federated Database structure

Figure 1: Structure of main database.

Figure 2: Structure of run le databases.

You might also like

datastore specic part.

2.1 Data Information Classication

Table 1: Number of runles and events per year.

s { number of datasets per LEP period 10

A container is a collection of basic objects. They are physically clustered

Figure 2: Structure of runle databases.