Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

focus

developing scientific software

Scientific Software
as Workflows:
From Discovery to Distribution
David Woollard and Chris A. Mattmann, NASA Jet Propulsion Laboratory
and University of Southern California

Nenad Medvidovic and Yolanda Gil, University of Southern California

I
n the last 30 years, science has undergone a radical transformation. In place of test
Classifying workflow
tubes and optics benches, chemists, physicists, and experimental scientists in a host
environments
of other disciplines are using computer simulation to discover and validate new sci-
according to the ence. This “in silico” experimentation has been fueled by numerous computer sci-
scientific research ence advances, including the ability to archive and distribute massive amounts of data and
phases to which they share hardware resources via the Grid.1
apply is useful for An activity central to in silico experimentation choose.3 One such challenge is that there’s no stan-
scientists interested is orchestration—assembling scientific codes into dard workflow model or fundamental “science” of
an executable system with which to experiment. scientific workflows. At “Challenges of Scientific
in adopting workflow
Orchestration of in silico experiments, much like Workflows,” a recent US National Science Founda-
technology. its closely related cousin in the business world,2 is a tion workshop chaired by one of us, this problem
complex task involving data management (locating was cited as a challenge facing workflow research-
data, reformatting, and so on), managing input pa- ers today.4
rameters for executables, and handling dependen- Thus, requirements for these environments vary
cies between processing elements. significantly among applications. This suggests that
One potentially useful tool for orchestration is we need a taxonomy of workflow environments
workflow environments. At NASA’s Jet Propulsion based on the scientific research activities that they
Laboratory (JPL), for example, scientists and engi- support. Our experience shows that classifying
neers have developed such environments to process these environments according to the phases of in
data from instruments, satellites, and rovers. The silico research to which they apply is useful for sci-
recently landed Phoenix mission to Mars and the entists interested in adopting workflow technology.
Orbiting Carbon Observatory, an Earth-observing Each phase has distinct scientific workflow require-
spectrometer mission set to launch in 2008, both ments. By making scientists aware of these require-
use workflow environments to process raw instru- ments, we intend to better inform their selection of
ment data into scientific products for the external such technologies.
science community.
Despite these early-adopter efforts, scientific Orchestrating experiments
workflow environments haven’t yet reached a large Before discussing the requirements for scientific
user base. Many significant challenges in this do- workflow environments, we must examine the
main remain unaddressed, despite the many sci- workflow’s role in scientific experimentation. By
entific workflow environments from which to framing our discussion with an understanding of

0 74 0 -74 5 9 / 0 8 / $ 2 5 . 0 0 © 2 0 0 8 I E E E July/August 2008 I E E E S o f t w a r e  37


tific workflows can’t be treated as throwaway!

Scientific workflow environments


Static In scientific workflow environments, workflow
models represent high-level scientific tasks as data-
Workflow model
processing stages (workflow stages) and the data
dependencies between the stages. Workflow envi-
ronments map the stages onto computational (often
Parses Grid-based) resources and plan the data movements
that will satisfy the dependencies. This mapping
Executes
Uses Workflow Workflow is often called a workflow instance or concrete
Dynamic
engine stage workflow. A workflow engine steps through the
instance, executing the stages and managing each
Accesses stage’s I/O requirements as specified by the model.
This is akin to the control-flow functionality of
Services (resource discovery, fault registry, provenance registry, etc.) script-based orchestration.
As Figure 1 illustrates, a workflow environ-
ment consists of not only the workflow engine but
Dependency Data Process Infrastructure component also ancillary services. These services encompass
many of the non-control-flow aspects of script-
Figure 1. A basic based orchestration, including resource discovery
workflow environment. scientists’ current experimentation practices, we in- for accessing data, fault handling, and data prov-
The ancillary services tend to clarify the process of evaluating these envi- enance cataloging. Although resource discovery
encompass many of ronments for a given application. services sometimes handle data-reformatting is-
the non-control-flow sues, workflow environments also commonly use
aspects of script-based Scripts and other methods a preprocessing workflow stage to handle format
orchestration. Scientists have solved the problem of orchestra- mismatches.
tion through various methods, including scientific-
programming environments such as IDL (Interac- Characterizing workflow
tive Data Language) and Matlab. The predominant environment requirements
orchestration mechanism, however, is the script. Although good taxonomies of workflow environ-
Scripting languages such as Perl (and more recently ments exist,3,5 each takes a bottom-up approach,
Python) let scientists perform many common or- differentiating workflow approaches by the type of
chestration tasks, including model used (for example, Petri nets versus directed
acyclic graphs) or by the strategy used in allocating
■ specifying overarching process control flow, computational resources. Although such character-
■ running precompiled executables (via com- ization can benefit software engineers and work-
mand line or runtime bindings), and flow practitioners, we contend that a top-down
■ reformatting input and output data. taxonomy based on the scientific goals addressed
is more useful to scientists interested in adopting
In common software engineering parlance, these workflow technologies.
scripts act as “glue code” between more well-
defined software modules. A script captures the Workflow phases
experiment—its setup (in the form of input param- Like all scientific endeavors, in silico science has
eters), procedure (the control flow, or execution distinctive phases. To take an example from bio-
steps), and record of results (formatting and catalog- chemistry, in the early 1920s, John Macleod and
ing outputs). So, it’s integral to the overall scientific Frederick Banting were the first to successfully iso-
effort and is an important artifact in its own right. late and extract insulin and are credited with its
However, script-based orchestration has prob- discovery. Once scientists refined their understand-
lems that make workflow modeling an attractive al- ing of insulin’s structure and properties, they de-
ternative. Scripts are difficult to maintain and easily veloped techniques for producing insulin in large
obfuscate the developer’s original intention. Lack quantities, eventually settling on genetically engi-
of inherent structure or design guidelines makes neering synthetic insulin in the late 1970s.
script-based orchestration a largely ad hoc process. Our three phases of in silico science mostly mir-
Unlike traditional glue code, orchestration in scien- ror the processes of in vivo and in vitro science:

38 IEEE Soft ware w w w. c o m p u t e r. o rg /s o f t w a re


1. During discovery, scientists test algorithms and Choosing from the
techniques—that is, they explore the scientific workflow environment spectrum
solution space—and arrive at a process that It is important to note that many workflow en- Distribution
yields the desired result. This is what Jeremy
Kepner called the “lone researcher process.”6
vironments exhibit the traits of more than one of
the three workflow categories. Additionally, some
workflow
2. Production, akin to the chemical-engineering scientific applications we’ve studied fall into more environments
process that synthesized insulin in large quan-
tities, is the engineering and scientific effort to
than one category.
Whereas both discovery and distribution work-
must support
reproduce the discovered process on a large flows must help scientists develop abstract work- rapid
scale. flows, distribution workflows emphasize the act of specification
3. During distribution, scientists share and vali- specification rather than the resulting workflow’s
date the process’s results and formulate new re- dynamism. Likewise, production and discovery (often using
search goals. workflow environments both manage data, al- graphical
These phases form the basis for the following
though production environments must do so with
much greater autonomy to process large data sets.
techniques)
classification of workflows. Rather than present a clean taxonomy of scien- and remote
Discovery workflows. These workflows are rapidly re-
tific workflow environments for its own sake, we
aim here to help scientists understand their own
execution
parameterized, letting scientists explore alternatives scientific workflow requirements, showing how this of abstract
quickly to iterate an experiment until they’ve vali- approach can help them choose an environment workflows.
dated their hypotheses. Discovery workflow envi- that caters to their scientific goals.
ronments support such dynamic experimentation.
This type of environment’s high-level require- An illustration of the workflow phases
ments include helping scientists formulate abstract One example of in silico scientific research is bio-
workflow models and transforming abstract mod- medical-imaging research. Biomedical researchers
els into workflow instances. use functional magnetic resonance imaging (fMRI)
to provide more accurate images to aid doctors’ di-
Production workflows. As in the case of producing agnoses, explore the ties between thought and mea-
vast quantities of insulin, production workflow en- surable biological activity, and help biologists ex-
vironments focus on repeatability. These environ- plore brain functions. We can apply our taxonomy
ments should be able to stage remote, high-volume to aspects of fMRI research.
data sets, catalog results, and log or handle faults. Because fMRI can be used to infer brain activity,
Unlike discovery workflows, production workflows researchers have explored its use in understanding
must incorporate substantial data management fa- which brain areas are activated during given tasks,
cilities. Scientists using production workflow en- how aging changes brain activity, and how diseases
vironments care less about the means of abstract can affect brain function. These researchers have
workflow representation than about the ability to developed algorithms for detection in research set-
automatically reproduce an experiment. tings, including optimization of voxel correlation to
A production workflow environment’s high- activity and better spatial-temporal resolution. Dis-
level requirements include handling the nonorches- covery workflows can support these activities.
tration aspects of workflow formation such as data- Biomedical-imaging researchers also aid doctors
resource discovery and data-provenance recording. in production-oriented activities. One such activ-
Such environments should also help scientists con- ity is patient imaging to help doctors plan surgery.
vert scientific executables into workflow stages, fMRI images are taken, corrected, registered, and
including providing means of accessing ancillary used to generate more accurate models of crucial
workflow services. brain function, which let surgeons more precisely
plan tumor excision.
Distribution workflows. Unlike the first two environ- An example distribution activity is telediagnosis.
ments, distribution workflow environments focus Multiple doctors, possibly at different locations, si-
on data retrieval. They use distribution workflows multaneously view the same set of fMRI images to
to combine and reformat data sets and deliver these consult with each other on a diagnosis. This scenario
sets to scientists. involves transmitting large sets of images across
These environments must support rapid speci- both local-area and wide-area networks, including
fication (often using graphical techniques) and re- the Internet. Picture Archiving and Communication
mote execution of abstract workflows. Systems (PACS) often handle these scenarios.

July/August 2008 I E E E S o f t w a r e  39
ronment. It also uses the Pegasus mapping and
L1 L2
execution system, which submits and monitors
G1 K1 G2 K2 G88 K88 Coll-G Coll-K
workflow executions.7
C-one C-one C-one D1 D2 In addition, Wings has constructs that ex-
press compactly the parallel execution of com-
Z1 Z2 Z88 NC1
C-one
ponents to concurrently process subsets of a
Y1
given data set.8 It encapsulates codes so that
C-many
L3 any execution requirements (such as target ar-
X1 D3 chitectures or software library dependencies) as
Coll-Z L4 well as I/O data set requirements and proper-
F-Y ties are explicitly stated. Code parameters that
D1 D2 D11 D12 D11 can be used to configure the codes (for scientists
D12
these might correspond to different models in
C-one C-many N2
C-many the experiment) are also represented explicitly
D3 D13 and recorded within the workflow representa-
L4 tion. A model of each code is created to express
Data D13 all such requirements, so the codes can be used
Computation F-X
flexibly by the workflow system as workflow
Data flow components. Component classes are created to
capture common properties of alternative codes
Figure 2. The Wings and their configurations. The methodology for
system represents Current workflow research designing workflows, encapsulating compo-
workflows in data- and Three of our current research projects illustrate our nents, and formalizing metadata properties ap-
execution-independent taxonomy. In showing how each project addresses pears elsewhere.9
structures. Parallel the high-level requirements of a particular class of Figure 2 shows how Wings represents work-
computations over workflow application, we aim to flows in data- and execution-independent struc-
data sets (on the top tures. For example, parallel computations over
left) are represented ■ highlight salient research topics in the area and data sets (on the top left) are represented compactly
compactly in the ■ illustrate our taxonomy with real-world work- in the Wings workflow on the right. The compo-
Wings workflow on the flow environments. nent representations on the bottom left express
right. The component whether the components can process individual
representations Discovery workflow environment: Wings data sets (for example, component C-one) or col-
on the bottom left Formulation of discovery workflows is an act of ex- lections of data (for example, component C-many)
express whether ploration for scientists, whether they’re systemati- as input. Wings exploits these constraints and rep-
the components can cally trying alternatives or haphazardly looking for resents the parallel computations as a collection
process individual a surprising result. To formulate useful workflows, of components (for example, node NC1) that are
data sets (for example, scientists try different combinations and configura- expanded dynamically into individual executable
component C-one) or tions of components and use new data sets or com- jobs depending on the size of the data set bound
collections of data (for ponents.5 An environment for formulation of dis- to the input variables (for example, to the variable
example, component covery workflows must Coll-G). For each of the new data products (for ex-
C-many) as input. ample, those bound to Coll-Z), Wings generates
■ help users find components, workflows, and metadata descriptions based on the metadata of
data sets on the basis of desired characteristics the original data sets and the component models.
or functions; Using these high-level and semantically rich
■ validate the newly created workflows with re- representations, Wings can reason about compo-
spect to the requirements and constraints of nent and data set properties to help users compose
both the components and data sets; and and validate workflows. Wings also uses these
■ facilitate the evolution and versioning of previ- representations to generate the details that Pega-
ously created workflows. sus needs to

One such environment is Wings.7,8 To repre- ■ map the workflows to the execution environ-
sent workflows, Wings employs semantic meta- ment and
data properties of components and data sets. ■ generate metadata descriptions of new data sets
Wings uses workflow representations that are that result from workflow execution and that
expressed independently of the execution envi- help track provenance of new data products.7

40 IEEE Soft ware w w w. c o m p u t e r. o rg /s o f t w a re


Workflow stage execution request/reply
Workflow engine
Exogenous connector

Data discovery
Custom Handler Logic
Event queue Invoking connector handle(Event control){
data = dataService.get(control.reqData);
processEvent = new Event(control,data); Module
Provenance registry
result = module.process(processEvent);
Event queue Module interface provenanceService.register(result);
faultService.register(result.exitCode);
Fault registry Event queue exoConn.dispatch(result);
Kernel
Event queue Thread pool } Architected
workflow
Representative module
Workflow environment stage

Asynchronous call Synchronous request/reply Component Connector

Figure 3. An
To facilitate workflow evolution, Wings expects scientific algorithm into components, isolating architecture for
each workflow to have its own namespace and on- communication with workflow services to spe- scientific-workflow
tology definitions. All workflows are represented cialized software connectors. This separation of stages, including
in the Web Ontology Language (OWL) and follow concerns lets scientists and software engineers connectors to workflow
conventions of Web markup languages in import- converse about workflow stage design without services. Users can
ing ontologies and namespaces. Each ontology and having to be experts in both science and software specify the necessary
namespace has a unique identifier and is never mod- engineering.10 communication between
ified or deleted; new versions are given new identifi- As the first step of creating the software archi- the workflow stage
ers. Related workflows can import shared ontology tecture, SWSA decomposes scientific code into and ancillary workflow
definitions and refer to common namespaces. We’re scientific kernels. These kernels, like code kernels services in custom
exploring extensions to these capabilities for more in high-performance computing, are code snip- handlers that are
manageable version tracking, particularly in collab- pets that implement a single scientific concept. isolated to the invoking
orative settings. In a graph-theoretic sense, a kernel is a portion connector for each
of the call dominance tree for the source code’s module.
Production workflow environment: SWSA basic blocks that has a single source and a single
As we mentioned before, production workflow en- sink—it has internal scope and single entry and
vironments incorporate significant data manage- exit points. To identify these kernels, users need to
ment technology to reproduce scientific workflows analyze the original source code’s basic block call
on very large data sets. Production environments graph to identify calls made to execute each kernel
must not only handle the workflow’s scientific (its control) and the data dependencies between
requirements but also manage the significant en- each kernel (its data flow). The current SWSA ap-
gineering challenges of automatic processing of proach requires manual decomposition; we’re ex-
large data sets. ploring semiautomatic approaches based on soft-
Owing to the demands of automatic processing, ware architecture recovery.
a particular challenge in production workflow en- In the second step, SWSA wraps these kernels in
vironments is integration of scientific codes. Locat- a component interface, creating modules (see Figure
ing remote data sets, handling faults appropriately, 3). It implements the original program’s control and
cataloging large volumes of data produced by the data flow in a hierarchically composed exogenous
system, and managing complex resource mappings connector.11 Finally, an invoking connector makes
of workflow stages onto Grid and multigrid envi- calls to ancillary workflow services. An engineer
ronments require scientific workflow stages to ac- can manage the scientific workflow’s engineering
cess a host of ancillary workflow services. requirements through custom handlers that access
The Scientific Workflow Software Architec- services such as data discovery, data provenance
ture (SWSA) project at JPL and the University of registries, and fault registries.
Southern California (USC) aims to provide a soft-
ware architecture for workflow stages that clearly Distribution workflow environment: OODT
differentiates scientific code from the workflow Once a production workflow has generated the
stage’s engineering aspects. SWSA breaks up the necessary scientific information, that information

July/August 2008 I E E E S o f t w a r e  41
(LAN) to the (local) PDS Navigation and
Production workflow
Ancillary Information Facility (NAIF) node.
The data must be delivered securely (using
port-based, firewall pass-through) to over 100
different engineering analysts and to project
Data Movement Data managers, totaling 10 distinct types of users.
Data access technology
subsetting distribution The users would like the data delivered in no
selection
Science more than 1,000 intervals. (An interval is a
community discrete data transfer event.) Owing to the un-
Distribution workflow
derlying network capacities of the dedicated
LAN to the NAIF and the WAN to the ESA,
Figure 4. The four the data must be delivered in 1 Mbyte to 1
canonical stages of must be appropriately disseminated to the scientific Gbyte chunks during each interval. The data
distribution workflows. community to effectively communicate the results. must reach its destination quickly in both
Data is accessed, it’s As Daniel Crichton and his colleagues noted, as the cases (owing to the information’s sensitivity
(potentially) subset, an study sample increases, so does the chance of dis- and the urgency with which it’s needed), so
appropriate movement covery.12 In the Internet age, scientists have begun scalability and efficiency are the most desired
properties of the transfer.
technology is selected, leaning on Internet technologies and large-scale
and then the data is data movement capabilities to exploit this principle,
ultimately disseminated sharing their raw science data and analyses with Clearly, such scenarios involve disseminating
to the scientific colleagues throughout the world. large amounts of information from data produc-
community. To move data to science users, distribution ers (for example, the PDS engineering node) to data
workflows leverage data movement technologies consumers (for example, ESA). Several emerging
and data packaging (or repackaging) technologies. large-scale data dissemination technologies purport
This process is underpinned by distribution scenar- to efficiently transfer data from producers to con-
ios specifying a data distribution’s important prop- sumers and to satisfy requirements such as in our
erties (for example, the total volume to be delivered, example scenario.
the number of delivery intervals, or the number of In our experience, however, some technologies
users to send data to). As Figure 4 illustrates, dis- are more amenable to different classes of distribu-
tribution workflows typically have four distinct tion scenarios than others. For example, Grid1
stages: technologies (such as GridFTP) are particularly
well suited to handle fast, reliable, highly paral-
1. Data access. The scientific information pro- lel data transfer over the public Internet. How-
duced by a production workflow is accessed (for ever, this benefit comes at the expense of running
example, from a database management system, heavyweight infrastructure (for example, security
a file system, or some other repository). trust authorities, Web servers, and metadata cata-
2. Data subsetting. If necessary, the data is pack- log services). On the other hand, peer-to-peer tech-
aged, repackaged, or “subsetted” (broken down nologies, such as Bittorrent, are inherently more
into subsets) for its external science customers. lightweight than Grid technologies and as fast as
3. Movement technology selection. An appropri- these technologies, but they’re less reliable and de-
ate data movement technology (for example, pendable. Distribution workflows must be able to
FTP, HTTP, or Bittorrent) is selected. decide, either with user feedback or autonomously,
4. Data distribution. The data is transferred to the which appropriate data movement technology and
science community. dissemination pattern (for example, peer-to-peer or
client-server) to employ to satisfy use cases such as
As an illustration of this workflow’s complexity, the PDS data movement problem.
consider this representative use case: Our recent research has investigated how to
construct distribution workflows that employ the
More than 100 Gbytes of Planetary Data appropriate technology for a given use case, in the
System (PDS) Spice (Spacecraft Planet Instru-
context of JPL’s Object Oriented Data Technology
ment Camera Matrix Events) kernel data
(OODT) data distribution framework.13 OODT
sets (describing planetary-mission navigation
information) and PDS data sets must be sub- provides services for packaging, repackaging, sub-
setted and sent across the wide-area network setting, and delivering large amounts of informa-
(WAN) from the PDS engineering node at tion, across heterogeneous organizational struc-
JPL to the European Space Agency (ESA) tures, to users around the world.
and from JPL across the local-area network In addition, our recent research at USC has led

42 IEEE Soft ware w w w. c o m p u t e r. o rg /s o f t w a re


to the construction of the Data-Intensive Software About the Authors
Connectors (Disco) decision-making framework.14
David Woollard is a cognizant design engineer at NASA’s Jet Propulsion Laboratory
Disco is a software extension to OODT that uses and a third-year PhD candidate at the University of Southern California. He’s also a primary
architectural information and data distribution developer on JPL’s Object Oriented Data Technology project. His research interest is software
metadata to autonomously decide the appropriate architectural support for scientific computing, specifically support for integrating legacy
sci­entific code into workflow environments. Woollard received his MS in computer science
data movement technology to employ for a distri- from USC. He’s a member of the IEEE. Contact him at woollard@jpl.nasa.gov.
bution scenario or class of scenarios. In our expe-
rience, the combination of a data movement infra-
structure and a decision-making framework that
Nenad Medvidovic is an associate professor in the University of Southern Califor­nia’s
can choose the underlying dissemination pattern Computer Science Department. His work focuses on software architecture modeling and
and technology to satisfy scenario requirements is analysis; middleware facilities for architectural implementation; product-line architectures;
essential to successfully implement and design dis- architectural styles; and architecture-level support for software development in highly dis-
tributed, mobile, resource-constrained, and embedded-computing environments. Medvidovic
tribution workflows. received his PhD in information and computer science from the University of California, Irvine.
He’s a member of the ACM and IEEE Computer Society. Contact him at neno@usc.edu.

W
e plan to continue fundamental re-
search into the science of scientific Yolanda Gil is the associate division director for research in the Intelligent Systems
workflow technology. This includes de- Division at the University of Southern California’s Information Sciences Institute and a
veloping more rigorous workflow models that can research associate professor in USC’s Computer Science Department. Her research interests
include intelligent interfaces for knowledge-rich problem solving and scientific workflows
help bridge the discovery and production workflow in distributed environments. Gil received her PhD in computer science from Carnegie Mellon
environments. University. Contact her at gil@isi.edu.
We’ll also continue to develop approaches to
ease adoption of workflow technology, including
improved integration of legacy scientific codes, bet- Chris A. Mattmann is a member of key staff at NASA’s Jet Propulsion Laboratory,
ter methodologies for adopting workflow model- working on instrument and science data systems on various earth science missions and
ing over script-based orchestration, and decision- informatics tasks. His research interests are primarily software architecture and large-
scale data-intensive systems. Mattmann received his PhD in computer science from the
making frameworks for delivering the resulting University of Southern California. He’s a member of the IEEE. Contact him at mattmann@
data products to the scientific community. jpl.nasa.gov.

Acknowledgments
We gratefully acknowledge support from the US Na-
tional Science Foundation under grant CCF-0725332. 2008, pp. 587–597.
Some research described in this article was conduct- 8. Y. Gil et al., “Wings for Pegasus: Creating Large-Scale
ed at the Jet Propulsion Laboratory, managed by the Scientific Applications Using Semantic Representations
of Computational Workflows,” Proc. 19th Ann’l Conf.
California Institute of Technology under a contract
Innovative Applications of Artificial Intelligence (IAAI
with the US National Aeronautics and Space Admin-
07), AAAI Press, 2007, pp. 1767–1774.
istration.
9. Y. Gil, P.A. Gonzalez-Calero, and E. Deelman, “On
the Black Art of Designing Computational Workflows,”
References Proc. 2nd Workshop Workflows in Support of Large-
Scale Science (Works 07), ACM Press, 2007, pp. 53–62.
1. I. Foster et al., The Physiology of the Grid: An Open
Grid Services Architecture for Distributed Systems 10. D. Woollard, Supporting In Silico Experimentation
Integration, Globus Alliance, 2002, www.globus.org/ via Software Architecture, tech. report USC-CSSE-
alliance/publications/papers/ogsa.pdf. 2008-813, Center for Systems and Software Eng., Univ.
of Southern Calif., 2008, http://sunset.usc.edu/csse/
2. C. Peltz, “Web Services Orchestration and Choreogra-
TECHRPTS.
phy,” Computer, vol. 36, no. 10, 2003, pp. 46–52.
11. K.-K. Lau, M. Ornaghi, and Z.Wang, “A Software
3. J. Yu and R. Buyya, “A Taxonomy of Scientific Work-
Component Model and Its Preliminary Formalisation,”
flow Systems for Grid Computing,” ACM S igmod Rec­­-
Proc. 4th Int’l Symp. Formal Methods for Components
ord, vol. 34, no. 3, 2005, pp. 44–49, www.sigmod.org/
and Objects, Springer, 2006, pp. 1–21.
sigmod/record/issues/0509/p44-special-sw-section-7.pdf.
12. D. Crichton et al., “A Distributed Information Services
4. Y. Gil et al., “Examining the Challenges of Scientific
Architecture to Support Biomarker Discovery in Early De­
Workflows,” Computer, vol. 40, no. 12, 2007, pp.
tection of Cancer,” Proc. 2nd IEEE Int’l Conf. E-science
24–32.
and Grid Computing, IEEE CS Press, 2006, p. 44.
5. Y. Gil, “Workflow Composition: Semantic Repre-
13. C. Mattmann et al., “A Software Architecture-Based
sentations for Flexible Automation,” Workflows for
Framework for Highly Distributed and Data Inten-
E-science, D. Gannon et al., eds., Springer, 2006, pp.
sive Scientific Applications,” Proc. 28th Int’l Conf.
244–257.
Software Eng. (ICSE 06), IEEE CS Press, 2006, pp.
6. J. Kepner, “HPC Productivity: An Overarching View,” 721–730.
Int’l J. High-Performance Computing Applications,
14. C. Mattmann et al., “Software Connector Classifica-
vol. 18, no. 4, 2003, pp. 393–397.
tion and Selection for Data-Intensive Systems,” Proc.
7. J. Kim et al., “Provenance Trails in the Wings/Pegasus ICSE 2007 Workshop Incorporating COTS Software
Workflow System,” to be published in Concurrency and into Software Systems: Tools and Techniques (Iwicss),
Computation: Practice and Experience, vol. 20, no. 5, IEEE CS Press, 2007, p. 4.

July/August 2008 I E E E S o f t w a r e  43

You might also like