A Formal Analysis of HL7 Version 2.x

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

704 User Centred Networked Health Care

A. Moen et al. (Eds.)


IOS Press, 2011
© 2011 European Federation for Medical Informatics. All rights reserved.
doi:10.3233/978-1-60750-806-9-704

A Formal Analysis of HL7 Version 2.x


Frank OEMIGa 1, Bernd BLOBELb
a
Agfa Healthcare, Bonn, Germany
b
eHealth Competence Center, University Hospital Regensburg, Regensburg, Germany

Abstract. Working interoperability not only requires harmonized system’s


architectures, but also the same interpretation of technical specifications in order to
guide the development processes. But sometimes a specification has not made the
underlying model explicit which would enable a coherent understanding. This
paper analyses the structures of the HL7 Version 2.x communication standard’s
family and presents an UML class diagram for it.

Keywords. HL7 Version 2.x, Communication Standard, UML class diagram,


Interoperability

1. Introduction

The utilization of communication standards in healthcare is normally enforced by


jurisdictional and user requirements. To support interoperable implementations of those
standards an exemplified model is not only helpful but a necessary prerequisite. Quite a
lot of the discussions between vendors and customers about the correct use of HL7
version 2.x [1] are due to the fact, that such a model merely does not exist, at least not
officially [2]. Nevertheless, the way the standard is written and the details it contains
allow for reverse engineering to extract it.
In the following we will elaborate on those details exemplifying it using a UML
class diagram [3].

2. Methods

To help with the development of such a model, a fine-grained analysis of the HL7 v2.x
communication standards is done by carefully examining the standard documents
starting with v2.1 up to v2.7. All identified information items are modeled using UML
class diagrams afterwards.

3. Results

The created class diagram is organized top-down. It starts with the relation of events to
messages. It should be noted that a single event may (indirectly) trigger 3 different
messages, i.e. beside the initial payload up to two acknowledgements may be sent in
1
Corresponding Author: Frank Oemig, Email: frank.oemig@agfa.com; Phone: +49-228-2668-4781; Agfa
HealthCare GmbH, Konrad-Zuse-Platz 1-3, 53227 Bonn, Germany. URL: http://www.agfa.com/healthcare
F. Oemig and B. Blobel / A Formal Analysis of HL7 Version 2.x 705

return. This fact may cause difficulties during runtime if it is not considered during
implementation.
Next, a message has a message structure which can be identified by a unique
identifier. In principle, this identifier is identical to a single segment group. However,
the standard does not manage it this way. A segment group has a recursive structure,
because it is a sequentially ordered list of segments and segment groups.

Figure 1. HL7 v2.x formal model as an UML class diagram


706 F. Oemig and B. Blobel / A Formal Analysis of HL7 Version 2.x

Each segment is described by a three character code, a name and a description and
consists of fields representing data elements. As such, a data element has an identifier
(5 digit number), a name, a description and a length.
A point for regular discussions is the assignment of the field’s attributes to either
the field itself or the relationship to a segment, i.e. its use within a segment respectively.
Common understanding and the detailed use in individual segments has led to the
decision to place the position, the optionality and the cardinality into the individual
relationship. As a conclusion the datatype of a specific field stays the same across all
usages in different segments. In return, it increases the amount of work when main-
taining/editing the standard documents to keep it consistent, but it also enforces
numerous discussions about the correct datatype or the way it can be improved for the
next version of the standard.
The datatypes control the contents of the fields in form of components. A datatype
may either be simple, i.e. contain only a single component, or a set of components, so
that again a recursive definition is given: Each component makes use of a datatype so
that it can be simple or complex again. This fact results in another problem of the
standard: In principle, it can be nested to arbitrary depth. But from its header
information it is limited to two, so that fields can have components and subcomponents.
Datatypes possibly not taking care of this requirement must be identified by hand. A
good example is “DR” (date range) consisting of two “TS” (timestamp) with the time
and timezone as components. Hence a “DR” cannot be used as a component.
An abstract simple datatype can be subclassified as being either uncoded, coded or
structured. Structured data types in contrast to uncoded datatypes (“ST”/string and
“TX”/text) provide information about the format of the represented data like “TS” for
timestamps. A coded datatype (“IS” or ”ID”) has a relationship to a table specifying
possible values including their description. Such a table is either prespecified by HL7
and does not allow for changes (i.e. fixed), or may contain example values which can
be redefined for site-specific adaptations (i.e. variable). Within the standard a
duplication of this specification has been taken place: The datatype “CNE” (coded no
exceptions) consists of the component with datatype “ID” being bound to an HL7-
defined table. Consequently, “CWE” (coded with exceptions) refers to “IS” with user-
defined tables. It would be enough to have this specification available in one place only.
In return, during maintenance the consistency must be ensured manually in form of a
tedious process.
The encoding of the messages and their details is controlled by a set of delimiters.
Whether they are required or optional and either fixed or variable is noted on the left
hand side in Figure 1. The standard defines a set of default delimiters which can be
adjusted for use with legacy systems not allowing for those special characters.

3.1. Detailed Comments

The single letters (a – u) in Figure 1 mark classes and relations which are explained
with some additional remarks in Table 1.
F. Oemig and B. Blobel / A Formal Analysis of HL7 Version 2.x 707

Table 1: Description of marks (a-u) in the HL7 v2.x formal model of an UML class diagram.
Ref Description Ref Description
# #
a) The relationship between a message and message l) Early v2 versions require an
structure is not always defined correctly, i.e. more intermediate layer. The datatype “CM”
than one exist or is inconsistently defined. (composite) is used wherever a set of
b) Segment groups allow for arbitrary nesting. components is needed without
c) Data types allow for arbitrary nesting. But due to specifying the necessary details. As a
the standard encoding rules (ER7) no more than consequence, there is no simple way to
two recursions are allowed. As explained above, handle it. In the meantime, each
this fact must be ensured manually. datatype has a clear definition.
d) The identification of the correct delimiter depends m) Datatypes and fields can either be
on the use of the datatype as a field or component simple or complex: in principle this is
and therefore cannot be specified directly. the same fact.
e) Variable delimiters require a higher development n) Code tables with [n..m] have a different
effort so that some implementations do not take cardinality.
care of it. o) In principle, the delimiters are table
A fixed delimiter is necessary in conjunction values itself.
with segments, so that a parsing engine can clearly p) In the standard some of the classes are
identify segments as parts of a message. represented as tables as well; 0003
Quite often, the correct character “CR” (events), 0076 (message types) and 0354
(carriage return) is mistakenly written as “CR/LF”. (message structures) are examples
f) Most of the implementations assume fixed thereof.
delimiters; the German message profiles constrain q) The delimiters are defined with default
them to be fixed to the given default. In return, the values. Most implementations cannot
way messages are created or processed cannot be handle alternative values.
tested. This is a great barrier when it comes to r) The OID is assigned to the table but not
certification of an interoperable encoding. to the codesystem because no separation
g) Originally, four of the five delimiters must be is made for the different versions. In
given. In later versions this is corrected so that all order to handle the different value sets
delimiters must be present. correctly, an OID must be assigned to
h) The structures for the three messages initiated by a the codesystem individually requiring an
single trigger event are defined as payload, investigation about the semantics.
transport acknowledgement and application s) The maximum length is officially
acknowledgement. In case of routing the initial normative but with a back door which is
messages as a broadcast to a set of recipients it may closed with v2.7. For a correct
lead to a high amount of acknowledgement representation minimum and
messages in return. A coherent processing of conformance lengths are introduced.
messages across different applications becomes a t) Starting with v2.7 a new delimiter is
challenge. introduced allowing for indicate
i) The workflow is not explicitly defined in the information which has been truncated
standard documents. A new proposal for v2.8 by the sending system. But for
should help to avoid the most problematic backward compatibility reasons this new
mistakes. Furthermore, IHE Technical Frameworks delimiter is optional.
specify workflows in form of integration profiles. u) Components and subcomponents are
j) Fixed segments are fully specified, whereas realized by data types.
variable segments may have the last field being
added as a new field as often as necessary.
k) Sometimes a table is assigned to a complex field;
here it is implicitly meant that the table is assigned
to a component.
708 F. Oemig and B. Blobel / A Formal Analysis of HL7 Version 2.x

4. Discussion

In principle, a message structure is nothing else than a segment group. Hence the
question can be raised whether a separation in form of two distinct classes is necessary.
Another question is the proper use and representation of character sets. Originally,
the standard only allows for 7-bit ASCII characters, although common understanding
directly works with 8-bit ISO-8859. An elaboration of the associated problems is worth
another paper [5].

5. Conclusions

The extraction of an UML based class diagram is possible and its availability will
decrease expensive discussions and may prevent from wrong implementations
enhancing semantic interoperability among different applications.
The next logical step is the alignment of this UML class diagram with the generic
component model (GCM) [6] to abstract it to a communication standards ontology
(CSO) [7, 8] which can then be used to enhance and improve semantic interoperability
among different applications.

Acknowledgments. The authors are indebted to their colleagues from HL7 for their kind
collaboration.

References

[1] HL7 Inc., Ann Arbor: "HL7 Version 2.x", http://www.hl7.org


[2] Oemig F, Dudeck J. "Problems in developing a comprehensive HL7 database", AMIA Fall Symposium.
1996. Hanley & Belfus Inc. ISBN: 1-56053-208-4, p.841.
[3] UML, the Unified Modeling Language, http://www.uml.org
[4] IHE, Integrating the Healthcare Enterprise, http://www.ihe.net
[5] Oemig F, Blobel B. “Character Sets: An invisible Pre-requisite towards Cross-Border Interoperability?”,
EFMI Special Topic Conference, Slovenia, April 2011, accepted paper
[6] Oemig F, Blobel B. “Harmonizing the semantics of technical terms by the Generic Component Model",
10th International Special Topic Conference of the European Federation for Medical Informatics in
Reykjavik Iceland, 2-4 June 2010, IOS Press, ISBN: 978-1-60750-562-5, 115-121.
[7] Oemig F, Blobel B. "Semantic Interoperability between Health Communication Standards through
Formal Ontologies", Studies in Health Technology and Informatics 150, IOS Press, ISBN: 978-1-
60750-044-5, (2009), 200-204.
[8] Oemig F, Blobel B."A Communication Standards Ontology using Basic Formal Ontologies", Studies in
Health Technology and Informatics 156, (2010), IOS Press, London, ISBN: 978-1-60750-564-8, 105-
113.

You might also like