Object Model of Delphi Data: DELPHI Collaboration DELPHI 98-157 PROG-235 6 October, 1998

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

DELPHI Collaboration DELPHI 98-157 PROG-235

6 October, 1998

Object model
of Delphi Data
N.Smirnov
F.Carena
Tz.Spasso

Abstract
The note presents the design of an Object model of Delphi data as the
rst step in the development of an Object Oriented analysis framework
for Delphi experiment. The project goals are to save the Delphi data
for a many years period providing working environment suitable for
the current generation of physicists as well as for the future one.
0
1 INTRODUCTION
Object oriented design is a key technology for the next generation of HEP software.
The creation of an object model of Delphi data is the rst indispensable step in
the development of an Object Oriented (OO) analysis framework in the Delphi
experiment.
The aims of this project are:
 to provide a framework for moving toward Object Oriented Programming
(OOP) with C++ as an implementation language;
 to gain software development and physics analysis experience with the new
HEP programming technologies;
 to attract and maintain the interest in analysing the Delphi data far beyond
the end of the LEP data taking;
 to prepare the saving and archiving the Delphi data for a long period of time
(some 30 years) over which it is assumed that today's computing environment
(existing operating systems, Fortran programming language, current CERN
Program Library etc.) would no longer remain.

In the accomplishment of the project three new for HEP technologies have been
applied { OOP, C++ programming language and Object Data Base Management
System (ODBMS) 1.
The OOP is a programming style known for about 30 years (e.g. Simula67,
Smalltalk70) and gaining recently widespread acceptance in the HEP. It is based on
two major ideas: Abstract Data Types (ADT) and and Hierarchical Types. The
concept of the ADT is that an object's type (class) should be de ned by a name
and a set of proper values and operations rather than by its storage structure, which
should be hidden. Hierarchical types allow one to de ne general interfaces to make
the commonality between objects explicit by using inheritance relationship.
The main advantage of the OOP is the possibility to create components which
are sets of classes united by some logical criteria, a common style or a reliance on
common services. This should allow to create the so called \plug-and-play" kind of
software increasing reuse of code and design.
C++ is a general-purpose programming language that supports data abstraction
and OOP. It is rapidly becoming the standard in the HEP applications. In the same
time the present object data model design re ects a necessity to maintain the existing
data analysis algorithms written in FORTRAN while foreseeing a migration to C++
and OOP.

1 A forth one { ORB (Object Request Broker) { is under investigation.

1
The creation of object data model requires an example of persistent data store
to be based on. The one currently used by Delphi is the data management system
ZEBRA [1] not founded on the object approach. When developing the design of an
object data model it is indispensable to take into account also its implementation
founded on the object approach, e.g. ODBMS 2 . The present model is considering
simultaneously two di erent implementation approaches { ZEBRA and ODBMS
The Objectivity/DB [2] is used as available ODBMS. Objectivity/DB based
solutions are being adopted in production by a growing number of experiments
at CERN and outside, and have been selected for a number of high-data volume
experiments starting soon (BaBar, COMPASS etc.). It is the most probable ODBMS
for archiving the LEP data.
According to the Objectivity/DB Guide \Objectivity/DB is a distributed client/
server object database management system (ODBMS) for applications that require
exible information modeling, involve complex relationships, and demand high per-
formance. Objectivity/DB provides full database functionality with a distributed
architecture that supports networks of heterogeneous platforms." [2]

2 DATA MODEL
The Delphi experimental data are subject to a set of processing stages that may
either add information to the events, or make selection based on certain criteria,
or both. The processing starts with the event reconstruction aiming to determine
physical quantities (vertices, tracks, energy clusters) from the raw data. On this
stage data for graphical event display and primary DST data are created. Further
selections and analysis based on the underlying physics data (particle identi cation
etc.) are performed and Post DST type data produced. They are nal physicist
data that are used for physics analysis. The object data model describes only the
result of the nal processing stage and does not take into account the intermediate
ones, nor the raw data.
Delphi events are reconstructed several times, usually with di erent con gura-
tion parameters, re ecting the improvement in the understanding of the detector
performance and physics goals. Each of these processing stages generate a new ver-
sion of the output data and do not overwrite the earlier one which stays available
and may optionally be deleted later. The data versioning is not considered in the
object model and only the last and presumably the best version is included.

2If one creates an object data model, how likely is he to get it right, when his experience is
limited to only one non-object oriented data management system ?

2
It is important to distinguish between transient and persistent data models.
The persistent model consists of objects which have lifetime exceeding the lifetime
of the user programs, i.e. which can be stored on external media. The transient
model is a logical structure seen by the users in their physical applications. At
present the di erences between the transient model and the persistent model based
on Objectivity/DB are purely technical and will not be discussed in this note.

The object data model consists of three major parts :


 object data part;

 object interfaces;

 datastore speci c part.

The object interfaces are closely related to the I/O system and will be described
elsewhere later.

The object model of Delphi has been developed taking into account the following
considerations :
 Delphi data information;
 user access patterns;
 implementation requirements of ZEBRA and Objectivity/DB.

2.1 Data Information Classi cation


The Delphi data are organised in collections called datasets. Each dataset contains
the events of one year of LEP data taking and corresponds to a given software
reprocessing. The number of events vary substantially from one year to another
increasing during the LEP1 (1991 { 1995) and LEP2 (1995 { 1997) energy periods.
The datasets are subdivided in runs each one covering a data taking time with
stable LEP conditions. The runs are attributed run numbers as determined from
the online system. They are spread over several les which length is de ned to be
optimal for the output media and further manipulations.
Run le is the smallest event collection unit of Delphi. The information associated
with the LEP running conditions as the beam energy and position (spot), detector
quality, luminosity etc. is organised on the run le basis.

3
Year Number of Run les Number of events

1991 3 894 713 019


1992 4 302 1 706 622
1993 2 746 2 153 805
1994 5 177 4 983 339
1995 2 984 3 114 581

1995 516 215 809


1996 1 804 409 806
1997 2 459 997 349

Table 1: Number of run les and events per year.

Delphi events consist of :


 Pilot information { Event Identi er (run, event number) and a summary of its
topological and global physics characteristics;
 Vertices and Tracks;
 Detector information for the tracks;
 Information related to the whole event, e.g. b-tagging, V0 decays;
 Associations between di erent information elements, e.g. vertices { tracks.
Rough estimation of some important Delphi data parameters biased to upper
limits for both LEP1 and LEP2 energy periods is shown on Table 2.

s { number of datasets per LEP period  10


r { number of run les per LEP period  20000
e { number of events per run le  10000
v { number of vertices per event  10
t { number of tracks per event  100
l { number of unassociated track elements per event  100
h { number of unassociated VD hits per event  100
d { number of V0 decays per event  50

Table 2: Main Delphi data parameters.

4
2.2 User access patterns
The data access optimization is an important goal of the object data model. Bet-
ter access performance of the physics applications could be achieved by collecting
together the information that is more likely to be used together during the data
analysis. The most frequent user access patterns are established on the basis of the
experience in analysing the Delphi data.
On the dataset level the users are accessing sequentially all events of the set, or
a small subset of randomly selected events (usually about several percents). With
the increased processing homogeneity access and analysis of several datasets simul-
taneously (e.g. all LEP1 data) should be envisaged as well.
Concerning the event level the users need to access not only the event information
as a whole but di erent parts of it depending of their analysis aim :
 Event Identi er
 Pilot information
 Tracks only;
 Tracks + Vertices;
 Tracks + Vertices + Detector information.

2.3 Implementation requirements


When ZEBRA is used as a persistent data store only two information groups can
be considered for the event { Pilot and the rest of the event.
The Objectivity/DB as a data store allows to cluster the information from dif-
ferent parts of the event in an arbitrary way. This can be done using the Objectiv-
ity/DB storage classes :
 A basic object is the fundamental storage unit, e.g. vertices, tracks etc;

 A container is a collection of basic objects. They are physically clustered


together in memory, so access to all basic objects in a single container is very
ecient;
 A database is a collection of containers;

 A federated database contains user-de ned databases and the object model
(or schema) that describes all public class de nitions.

5
Each basic object is included within a container, each container is included within
a database, and each database belongs to a single federated database. The Objectiv-
ity/DB imposes some requirements and restrictions on each of these storage classes.
Each federated database (FDB) contains by de nition a schema describing the
object model. The existence of many FDBs within the same data model creates
diculties to maintain the schemas consistent. Moreover the Objectivity/DB allows
only one FDB per process to be accessed. These imply the need to minimize the
number of the FDBs per model. The Delphi object model has only two FDBs
containing the LEP1 and LEP2 data respectively. Their schemas are identical for
the moment, but this partition allows to modify the model for LEP2 if necessary in
the future.
The utilisation of multiple databases (DB) and multiple containers is restricted
by the Objectivity/DB. The number of DB per FDB must not exceed 64K. The
DB is a physical le and its size is limited by the current le systems. Taking into
account these restrictions, each DB in the present object model contains the data
of one run le.
The number of containers per DB is limited to 32K. Containers are clustering
units of the Objectivity/DB and their design should re ect the user access patterns
(Section 2.2). Di erent parts of the same event are hold in di erent containers, e.g.
track container, vertex container etc.

{ Vertex container e/b


{ Vertex objects vb
{ Track Container e/b
{ Track objects tb
{ Track geometry objects tb
{ TrackDetector container e/b
{ TrackDetector Info List objects tb
{ VD Hits (associated) objects 10tb
{ Detector Container e/b
{ Detector Info List objects b
{ Unass. TE List objects b
{ Unass. TE objects lb
{ VD Hits (unass.) objects hb
{ V0 ID objects db
{ Dstar List objects b

Table 3: Estimation of number of containers per DB and objects per container.

6
The maximum number of basic objects per container depends on their size and
some FDB parameters (e.g. the page size). It can vary from the lower limit of 64K
up to 4G. In case of Delphi data the number of objects can easily exceed the lower
limit as one can see from Table 2 and some protection measures against possible
over ow should be taken. The chosen approach is to set a limit on the number of
events per container restricting this way the total number of objects in the related
containers. When the number of events exceeds the established limit next bunch of
containers is lled-up.
The estimated number of containers per DB and objects per container is shown
on Table 3 where b stands for the limit of event numbers per container and the other
designations can be seen from Table 2.

2.4 Federated Database structure


The nal design of the Federated Database structure is based on the considerations
explained above (2.1 { 2.3). A FDB consists of one main database MAIN DB and
many run le databases RUNxxx DB. Their structures are shown on Figure 1 and
Figure 2 respectively.
The MAIN DB has one dataset container DatasetCont with objects contain-
ing general information about the datasets. Run le related information and Event
Identi ers (EID) of all events of this run le are stored in many corresponding run le
containers EIdRxxxCont. Each run le container has one Run Information Iden-
ti er (RID) object RunInfoId, one Run Information object RunInfo with run le
related information as beam energy and position (spot), luminosity, etc., and many
Event Identi er objects EIdRxxx for each of the events belonging to this run le.
A typical task in the physics applications is to locate all EIDs belonging to a
dataset known by its name. This name de nes unambiguously the dataset object
Dataset in the container DatasetCont. The dataset object has many unidirec-
tional associations to RID objects. From RID objects one can navigate to the
corresponding run le container EIdRxxxCont and get all EIDs belonging to it.
Each EID has an association to the corresponding Event object which contains gen-
eral information about events, e.g. Pilot record. Event objects are hold in the event
container EventCont of the run le database RUNxxxDB (Figure 2).
Events are divided in ve parts according to the most likely user access patterns
(see 2.2). Each part is contained in a separate container { event object in Event-
Cont, vertices in VertexCont, tracks in TrackCont, detector information related
to the tracks in TrkDetectorCont, detector information related to the whole event
in EvtDetectorCont. All parts have bidirectional associations allowing to navigate
freely between them.

7
MAIN DB

DatasetCont

Dataset 1 Dataset 2

EIdRxxxCont EIdRyyyCont

RunInfoId RunInfoId

RunInfo RunInfo

EIdRxxx EIdRyyy

Figure 1: Structure of main database.

8
RUNxxx DB

EventCont

VertexCont TrackCont

EvtDetectorCont TrkDetectorCont

Figure 2: Structure of run le databases.

9
Acknowledgements
We are grateful to Delphi management for their support of this project. Special
thanks to Dirk Duellmann for the helpful discussions and consultations concerning
the Objectivity/DB.

10
REFERENCES 11

References
[1] \ZEBRA User Guide", CERN Program Library Q100/Q101, J.Zoll et al.
[2] \Using Objectivity/C++", Version 4, ObjectivityTM , Inc.

You might also like